flow-php / flow

Flow PHP - data processing framework
https://flow-php.com
MIT License
404 stars 23 forks source link

Fixed handling negative numbers in parquet binary reader/writer #1096

Closed norberttech closed 3 weeks ago

norberttech commented 3 weeks ago

Change Log

Added

  • support for parquet deprecated ConvertedType::INT_16 in order to support reading files generated by Amazon Redshift

Fixed

  • handling negative numbers in parquet binary reader/writer

Changed

Removed

Deprecated

Security


Description

github-actions[bot] commented 3 weeks ago

Flow PHP - Benchmarks

Results of the benchmarks from this PR are compared with the results from 1.x branch.

Extractors ```shell +-----------------------+-------------------+------+-----+------------------+------------------+----------------+ | benchmark | subject | revs | its | mem_peak | mode | rstdev | +-----------------------+-------------------+------+-----+------------------+------------------+----------------+ | AvroExtractorBench | bench_extract_10k | 1 | 3 | 35.456mb +0.00% | 818.450ms +4.17% | ±0.70% +55.26% | | CSVExtractorBench | bench_extract_10k | 1 | 3 | 5.175mb +0.00% | 341.047ms +8.94% | ±0.68% -47.98% | | JsonExtractorBench | bench_extract_10k | 1 | 3 | 5.208mb +0.00% | 1.073s +7.08% | ±1.36% -16.97% | | ParquetExtractorBench | bench_extract_10k | 1 | 3 | 135.879mb +0.00% | 750.140ms +7.86% | ±1.01% -43.44% | | TextExtractorBench | bench_extract_10k | 1 | 3 | 4.965mb +0.01% | 35.671ms +6.78% | ±0.87% -39.25% | | XmlExtractorBench | bench_extract_10k | 1 | 3 | 4.971mb +0.01% | 440.205ms +8.40% | ±0.43% -51.12% | +-----------------------+-------------------+------+-----+------------------+------------------+----------------+ ```
Transformers ```shell +-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+ | benchmark | subject | revs | its | mem_peak | mode | rstdev | +-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+ | RenameEntryTransformerBench | bench_transform_10k_rows | 1 | 3 | 116.272mb +0.00% | 59.862ms +8.14% | ±1.26% -62.35% | +-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+ ```
Loaders ```shell +--------------------+----------------+------+-----+------------------+------------------+-----------------+ | benchmark | subject | revs | its | mem_peak | mode | rstdev | +--------------------+----------------+------+-----+------------------+------------------+-----------------+ | AvroLoaderBench | bench_load_10k | 1 | 3 | 96.842mb +0.00% | 456.481ms +5.62% | ±0.24% -86.21% | | CSVLoaderBench | bench_load_10k | 1 | 3 | 55.254mb +0.00% | 67.209ms +6.13% | ±0.81% +69.60% | | JsonLoaderBench | bench_load_10k | 1 | 3 | 107.628mb +0.00% | 51.470ms +8.11% | ±1.87% +264.42% | | ParquetLoaderBench | bench_load_10k | 1 | 3 | 227.048mb +0.00% | 1.425s +9.22% | ±0.50% -56.79% | | TextLoaderBench | bench_load_10k | 1 | 3 | 18.009mb +0.00% | 38.644ms +10.48% | ±0.26% -20.53% | +--------------------+----------------+------+-----+------------------+------------------+-----------------+ ```
Building Blocks ```shell +-------------------------+----------------------------+------+-----+------------------+------------------+-----------------+ | benchmark | subject | revs | its | mem_peak | mode | rstdev | +-------------------------+----------------------------+------+-----+------------------+------------------+-----------------+ | RowsBench | bench_chunk_10_on_10k | 2 | 3 | 87.091mb +0.00% | 3.311ms +0.74% | ±0.80% -42.66% | | RowsBench | bench_diff_left_1k_on_10k | 2 | 3 | 102.689mb +0.00% | 185.909ms +7.42% | ±0.33% +35.48% | | RowsBench | bench_diff_right_1k_on_10k | 2 | 3 | 85.409mb +0.00% | 18.515ms +6.48% | ±1.31% -62.52% | | RowsBench | bench_drop_1k_on_10k | 2 | 3 | 88.331mb +0.00% | 1.685ms +19.67% | ±0.92% -70.32% | | RowsBench | bench_drop_right_1k_on_10k | 2 | 3 | 88.331mb +0.00% | 1.689ms +18.94% | ±1.25% -7.83% | | RowsBench | bench_entries_on_10k | 2 | 3 | 85.443mb +0.00% | 2.755ms +10.91% | ±0.68% -10.62% | | RowsBench | bench_filter_on_10k | 2 | 3 | 85.972mb +0.00% | 17.135ms -6.24% | ±1.51% -52.02% | | RowsBench | bench_find_on_10k | 2 | 3 | 85.972mb +0.00% | 16.914ms -8.88% | ±0.31% -87.07% | | RowsBench | bench_find_one_on_10k | 10 | 3 | 83.876mb +0.00% | 1.600μs +6.24% | ±0.00% -100.00% | | RowsBench | bench_first_on_10k | 10 | 3 | 83.876mb +0.00% | 0.300μs 0.00% | ±0.00% 0.00% | | RowsBench | bench_flat_map_on_1k | 2 | 3 | 93.226mb +0.00% | 12.294ms +6.44% | ±0.54% -60.05% | | RowsBench | bench_map_on_10k | 2 | 3 | 122.597mb +0.00% | 60.895ms +8.03% | ±1.02% -21.08% | | RowsBench | bench_merge_1k_on_10k | 2 | 3 | 86.492mb +0.00% | 1.246ms +15.96% | ±2.05% +1.88% | | RowsBench | bench_partition_by_on_10k | 2 | 3 | 89.839mb +0.00% | 60.999ms +4.60% | ±1.04% -55.83% | | RowsBench | bench_remove_on_10k | 2 | 3 | 88.593mb +0.00% | 4.089ms +10.58% | ±0.19% -90.57% | | RowsBench | bench_sort_asc_on_1k | 2 | 3 | 84.020mb +0.00% | 39.955ms +10.34% | ±1.14% +104.14% | | RowsBench | bench_sort_by_on_1k | 2 | 3 | 84.020mb +0.00% | 39.386ms +7.37% | ±1.04% +65.59% | | RowsBench | bench_sort_desc_on_1k | 2 | 3 | 84.020mb +0.00% | 39.481ms +8.43% | ±0.51% -34.13% | | RowsBench | bench_sort_entries_on_1k | 2 | 3 | 86.317mb +0.00% | 7.340ms +6.70% | ±1.02% -66.63% | | RowsBench | bench_sort_on_1k | 2 | 3 | 83.876mb +0.00% | 29.052ms +6.68% | ±0.22% -80.47% | | RowsBench | bench_take_1k_on_10k | 10 | 3 | 83.876mb +0.00% | 13.318μs +11.75% | ±1.06% -10.45% | | RowsBench | bench_take_right_1k_on_10k | 10 | 3 | 83.876mb +0.00% | 16.601μs +15.29% | ±3.44% +0.00% | | RowsBench | bench_unique_on_1k | 2 | 3 | 102.690mb +0.00% | 190.504ms +3.86% | ±3.05% +480.97% | | NativeEntryFactoryBench | bench_entry_factory | 1 | 3 | 116.821mb +0.00% | 493.602ms +5.10% | ±0.97% -63.64% | | NativeEntryFactoryBench | bench_entry_factory | 1 | 3 | 60.299mb +0.00% | 250.716ms +9.05% | ±1.60% +63.86% | | NativeEntryFactoryBench | bench_entry_factory | 1 | 3 | 15.233mb +0.00% | 53.522ms +3.65% | ±3.21% +48.23% | | TypeDetectorBench | bench_type_detector | 1 | 3 | 60.000mb +0.00% | 426.361ms +5.98% | ±0.70% -74.14% | | TypeDetectorBench | bench_type_detector | 1 | 3 | 14.539mb +0.00% | 90.858ms +14.37% | ±3.24% +634.44% | +-------------------------+----------------------------+------+-----+------------------+------------------+-----------------+ ```