flow-php / flow

Flow PHP - data processing framework
https://flow-php.com
MIT License
491 stars 28 forks source link

Updated parquet thrift definitions #1251

Closed norberttech closed 1 month ago

norberttech commented 1 month ago

Change Log

Added

  • composer script for building thrift definitions

Fixed

  • Setting repetition level based on flow schema definition in parquet converter

Changed

  • Updated parquet thrift definitions
  • ParquetLoader will now make inferred schema nullable

Removed

Deprecated

Security


Description

github-actions[bot] commented 1 month ago

Flow PHP - Benchmarks

Results of the benchmarks from this PR are compared with the results from 1.x branch.

Extractors ```shell +-----------------------+-------------------+------+-----+-----------------+------------------+-----------------+ | benchmark | subject | revs | its | mem_peak | mode | rstdev | +-----------------------+-------------------+------+-----+-----------------+------------------+-----------------+ | CSVExtractorBench | bench_extract_10k | 1 | 3 | 4.619mb +0.02% | 517.475ms -0.53% | ±0.30% -14.48% | | JsonExtractorBench | bench_extract_10k | 1 | 3 | 4.707mb +0.02% | 1.086s +0.11% | ±3.49% +171.67% | | ParquetExtractorBench | bench_extract_10k | 1 | 3 | 29.163mb +0.52% | 436.328ms -0.67% | ±0.74% -26.69% | | TextExtractorBench | bench_extract_10k | 1 | 3 | 4.348mb +0.02% | 33.281ms -1.48% | ±1.12% -33.71% | | XmlExtractorBench | bench_extract_10k | 1 | 3 | 4.329mb +0.02% | 652.791ms +0.09% | ±0.60% -79.95% | +-----------------------+-------------------+------+-----+-----------------+------------------+-----------------+ ```
Transformers ```shell +-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+ | benchmark | subject | revs | its | mem_peak | mode | rstdev | +-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+ | RenameEntryTransformerBench | bench_transform_10k_rows | 1 | 3 | 116.623mb +0.00% | 61.261ms +2.88% | ±0.81% +14.08% | +-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+ ```
Loaders ```shell +--------------------+----------------+------+-----+------------------+------------------+-----------------+ | benchmark | subject | revs | its | mem_peak | mode | rstdev | +--------------------+----------------+------+-----+------------------+------------------+-----------------+ | CSVLoaderBench | bench_load_10k | 1 | 3 | 54.817mb +0.00% | 141.489ms -0.10% | ±1.02% -48.25% | | JsonLoaderBench | bench_load_10k | 1 | 3 | 90.401mb +0.00% | 118.864ms +1.16% | ±1.43% +16.50% | | ParquetLoaderBench | bench_load_10k | 1 | 3 | 124.454mb +0.18% | 1.263s +2.03% | ±1.10% +516.19% | | TextLoaderBench | bench_load_10k | 1 | 3 | 17.538mb +0.00% | 44.474ms +1.26% | ±0.43% +101.11% | +--------------------+----------------+------+-----+------------------+------------------+-----------------+ ```
Building Blocks ```shell +-------------------------+----------------------------+------+-----+------------------+------------------+-----------------+ | benchmark | subject | revs | its | mem_peak | mode | rstdev | +-------------------------+----------------------------+------+-----+------------------+------------------+-----------------+ | RowsBench | bench_chunk_10_on_10k | 2 | 3 | 87.371mb +0.00% | 3.986ms +12.10% | ±1.81% -43.19% | | RowsBench | bench_diff_left_1k_on_10k | 2 | 3 | 102.975mb +0.00% | 189.290ms -1.94% | ±0.83% -55.06% | | RowsBench | bench_diff_right_1k_on_10k | 2 | 3 | 85.695mb +0.00% | 19.198ms +2.28% | ±1.40% +327.71% | | RowsBench | bench_drop_1k_on_10k | 2 | 3 | 88.611mb +0.00% | 1.893ms +8.72% | ±1.40% -61.86% | | RowsBench | bench_drop_right_1k_on_10k | 2 | 3 | 88.611mb +0.00% | 2.001ms +12.26% | ±3.42% +192.28% | | RowsBench | bench_entries_on_10k | 2 | 3 | 85.723mb +0.00% | 3.178ms +9.31% | ±1.66% +77.12% | | RowsBench | bench_filter_on_10k | 2 | 3 | 86.252mb +0.00% | 15.722ms +1.50% | ±0.93% -36.32% | | RowsBench | bench_find_on_10k | 2 | 3 | 86.252mb +0.00% | 15.723ms +2.79% | ±2.00% +167.75% | | RowsBench | bench_find_one_on_10k | 10 | 3 | 84.156mb +0.00% | 1.994μs +5.28% | ±2.40% -5.08% | | RowsBench | bench_first_on_10k | 10 | 3 | 84.156mb +0.00% | 0.400μs +33.33% | ±0.00% +0.00% | | RowsBench | bench_flat_map_on_1k | 2 | 3 | 93.506mb +0.00% | 14.313ms +11.63% | ±2.41% +226.09% | | RowsBench | bench_map_on_10k | 2 | 3 | 122.877mb +0.00% | 64.182ms +1.86% | ±1.84% +77.89% | | RowsBench | bench_merge_1k_on_10k | 2 | 3 | 86.772mb +0.00% | 1.918ms +18.24% | ±1.27% -49.31% | | RowsBench | bench_partition_by_on_10k | 2 | 3 | 90.124mb +0.00% | 65.787ms -0.90% | ±0.91% -35.38% | | RowsBench | bench_remove_on_10k | 2 | 3 | 88.873mb +0.00% | 4.943ms +13.67% | ±0.63% -17.45% | | RowsBench | bench_sort_asc_on_1k | 2 | 3 | 84.305mb +0.00% | 42.743ms +5.42% | ±1.45% +64.91% | | RowsBench | bench_sort_by_on_1k | 2 | 3 | 84.306mb +0.00% | 43.898ms +4.18% | ±1.27% -58.47% | | RowsBench | bench_sort_desc_on_1k | 2 | 3 | 84.305mb +0.00% | 41.981ms +1.64% | ±1.42% +34.21% | | RowsBench | bench_sort_entries_on_1k | 2 | 3 | 86.598mb +0.00% | 7.829ms +6.01% | ±1.34% -46.04% | | RowsBench | bench_sort_on_1k | 2 | 3 | 84.156mb +0.00% | 30.347ms +4.18% | ±1.67% +127.26% | | RowsBench | bench_take_1k_on_10k | 10 | 3 | 84.156mb +0.00% | 14.407μs +7.51% | ±2.17% +256.43% | | RowsBench | bench_take_right_1k_on_10k | 10 | 3 | 84.156mb +0.00% | 16.820μs +5.12% | ±1.02% +99.04% | | RowsBench | bench_unique_on_1k | 2 | 3 | 102.976mb +0.00% | 189.176ms -1.20% | ±1.32% +50.55% | | TypeDetectorBench | bench_type_detector | 1 | 3 | 53.197mb +0.00% | 396.474ms +0.54% | ±0.71% +122.13% | | TypeDetectorBench | bench_type_detector | 1 | 3 | 13.463mb +0.01% | 80.498ms -0.41% | ±1.54% +44.91% | | NativeEntryFactoryBench | bench_entry_factory | 1 | 3 | 107.460mb +0.00% | 481.473ms -0.35% | ±1.49% -5.57% | | NativeEntryFactoryBench | bench_entry_factory | 1 | 3 | 55.818mb +0.00% | 248.071ms +1.46% | ±1.66% +336.25% | | NativeEntryFactoryBench | bench_entry_factory | 1 | 3 | 14.656mb +0.00% | 55.064ms +4.76% | ±2.04% +105.14% | +-------------------------+----------------------------+------+-----+------------------+------------------+-----------------+ ```