flow-php / flow

Flow PHP - data processing framework
https://flow-php.com
MIT License
404 stars 23 forks source link

Add support for ZSTD compression #1105

Closed flavioheleno closed 2 days ago

flavioheleno commented 2 days ago

Change Log

Added

  • zstd compression functions

Fixed

Changed

Removed

Deprecated

Security


Description

This PR adds support for ZSTD compression in Parquet Codec.

Closes #782.

github-actions[bot] commented 2 days ago

Flow PHP - Benchmarks

Results of the benchmarks from this PR are compared with the results from 1.x branch.

Extractors ```shell +-----------------------+-------------------+------+-----+------------------+------------------+-----------------+ | benchmark | subject | revs | its | mem_peak | mode | rstdev | +-----------------------+-------------------+------+-----+------------------+------------------+-----------------+ | CSVExtractorBench | bench_extract_10k | 1 | 3 | 3.910mb +0.04% | 510.027ms +0.18% | ±2.28% +372.28% | | JsonExtractorBench | bench_extract_10k | 1 | 3 | 3.943mb +0.04% | 1.071s +0.83% | ±0.44% -32.66% | | ParquetExtractorBench | bench_extract_10k | 1 | 3 | 135.375mb +0.00% | 759.438ms +0.96% | ±0.60% -65.62% | | TextExtractorBench | bench_extract_10k | 1 | 3 | 3.670mb +0.04% | 33.799ms -0.66% | ±0.81% +47.02% | | XmlExtractorBench | bench_extract_10k | 1 | 3 | 3.617mb +0.04% | 434.124ms -1.05% | ±0.35% -83.45% | +-----------------------+-------------------+------+-----+------------------+------------------+-----------------+ ```
Transformers ```shell +-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+ | benchmark | subject | revs | its | mem_peak | mode | rstdev | +-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+ | RenameEntryTransformerBench | bench_transform_10k_rows | 1 | 3 | 115.962mb +0.00% | 61.712ms +1.10% | ±0.28% -83.79% | +-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+ ```
Loaders ```shell +--------------------+----------------+------+-----+------------------+-----------------+-----------------+ | benchmark | subject | revs | its | mem_peak | mode | rstdev | +--------------------+----------------+------+-----+------------------+-----------------+-----------------+ | CSVLoaderBench | bench_load_10k | 1 | 3 | 54.065mb +0.00% | 86.840ms +0.06% | ±1.13% +17.75% | | JsonLoaderBench | bench_load_10k | 1 | 3 | 106.498mb +0.00% | 53.481ms -1.31% | ±0.46% -29.82% | | ParquetLoaderBench | bench_load_10k | 1 | 3 | 225.831mb +0.00% | 1.430s -0.17% | ±1.10% -0.78% | | TextLoaderBench | bench_load_10k | 1 | 3 | 16.859mb +0.01% | 44.167ms +0.33% | ±1.60% +349.74% | +--------------------+----------------+------+-----+------------------+-----------------+-----------------+ ```
Building Blocks ```shell +-------------------------+----------------------------+------+-----+------------------+------------------+-------------------------------+ | benchmark | subject | revs | its | mem_peak | mode | rstdev | +-------------------------+----------------------------+------+-----+------------------+------------------+-------------------------------+ | NativeEntryFactoryBench | bench_entry_factory | 1 | 3 | 116.513mb +0.00% | 501.640ms +0.22% | ±0.55% -60.87% | | NativeEntryFactoryBench | bench_entry_factory | 1 | 3 | 59.991mb +0.00% | 253.276ms +1.80% | ±0.99% +66.05% | | NativeEntryFactoryBench | bench_entry_factory | 1 | 3 | 14.924mb +0.01% | 56.272ms +3.12% | ±1.81% -41.09% | | TypeDetectorBench | bench_type_detector | 1 | 3 | 59.691mb +0.00% | 466.663ms +7.69% | ±1.75% +86.05% | | TypeDetectorBench | bench_type_detector | 1 | 3 | 14.230mb +0.01% | 88.526ms +2.12% | ±0.35% -71.15% | | RowsBench | bench_chunk_10_on_10k | 2 | 3 | 86.782mb +0.00% | 3.814ms +4.89% | ±1.62% -41.38% | | RowsBench | bench_diff_left_1k_on_10k | 2 | 3 | 102.380mb +0.00% | 188.975ms -0.62% | ±0.87% +210.95% | | RowsBench | bench_diff_right_1k_on_10k | 2 | 3 | 85.100mb +0.00% | 19.513ms +1.37% | ±1.49% +349.35% | | RowsBench | bench_drop_1k_on_10k | 2 | 3 | 88.022mb +0.00% | 2.150ms +4.47% | ±3.58% +3175.98% | | RowsBench | bench_drop_right_1k_on_10k | 2 | 3 | 88.022mb +0.00% | 1.900ms -2.32% | ±0.93% -74.61% | | RowsBench | bench_entries_on_10k | 2 | 3 | 85.134mb +0.00% | 2.945ms -0.65% | ±3.66% +331.95% | | RowsBench | bench_filter_on_10k | 2 | 3 | 85.663mb +0.00% | 15.247ms -9.91% | ±2.72% +348.49% | | RowsBench | bench_find_on_10k | 2 | 3 | 85.663mb +0.00% | 15.312ms -7.97% | ±1.56% +30.05% | | RowsBench | bench_find_one_on_10k | 10 | 3 | 83.568mb +0.00% | 1.906μs +0.32% | ±2.44% +20864134789308000.00% | | RowsBench | bench_first_on_10k | 10 | 3 | 83.568mb +0.00% | 0.400μs 0.00% | ±0.00% 0.00% | | RowsBench | bench_flat_map_on_1k | 2 | 3 | 92.917mb +0.00% | 12.751ms +0.75% | ±3.69% +92.95% | | RowsBench | bench_map_on_10k | 2 | 3 | 122.288mb +0.00% | 63.123ms +0.31% | ±0.96% +43.92% | | RowsBench | bench_merge_1k_on_10k | 2 | 3 | 86.183mb +0.00% | 1.784ms +10.30% | ±1.29% -42.89% | | RowsBench | bench_partition_by_on_10k | 2 | 3 | 89.530mb +0.00% | 65.046ms +1.33% | ±0.68% -9.30% | | RowsBench | bench_remove_on_10k | 2 | 3 | 88.284mb +0.00% | 4.203ms -4.59% | ±1.31% -56.75% | | RowsBench | bench_sort_asc_on_1k | 2 | 3 | 83.645mb +0.08% | 39.955ms +0.18% | ±1.62% +3.54% | | RowsBench | bench_sort_by_on_1k | 2 | 3 | 83.646mb +0.08% | 41.468ms +0.58% | ±1.56% -12.87% | | RowsBench | bench_sort_desc_on_1k | 2 | 3 | 83.645mb +0.08% | 40.520ms +2.26% | ±1.96% -31.32% | | RowsBench | bench_sort_entries_on_1k | 2 | 3 | 86.009mb +0.00% | 7.525ms -5.29% | ±3.49% +47.20% | | RowsBench | bench_sort_on_1k | 2 | 3 | 83.568mb +0.00% | 29.146ms +1.85% | ±0.92% -15.66% | | RowsBench | bench_take_1k_on_10k | 10 | 3 | 83.568mb +0.00% | 14.817μs +7.21% | ±1.47% +62.21% | | RowsBench | bench_take_right_1k_on_10k | 10 | 3 | 83.568mb +0.00% | 17.699μs +6.14% | ±1.38% -55.27% | | RowsBench | bench_unique_on_1k | 2 | 3 | 102.382mb +0.00% | 195.093ms +0.56% | ±1.47% +382.01% | +-------------------------+----------------------------+------+-----+------------------+------------------+-------------------------------+ ```
norberttech commented 2 days ago

that was quick, thank you @flavioheleno 🍻