flow-php / flow

Flow PHP - data processing framework
https://flow-php.com
MIT License
404 stars 23 forks source link

Added Schema::match() with strict/evolving matchers #1027

Closed norberttech closed 3 months ago

norberttech commented 3 months ago

Change Log

Added

  • Added Schema::match() with strict/evolving matchers

Fixed

Changed

Removed

Deprecated

Security


Description

This is an introduction to a potential schema evolution feature. Schema evolution is a process where by writing more data to dataset we are making sure that reading from it is still safe to the clients. The base assumptions are following:

however if we:

we can impact our dataset clients.

Using Schema::matches($schema, schema_evolving_matcher()): bool should prevent us from making an BC breaks in our datasets.

github-actions[bot] commented 3 months ago

Flow PHP - Benchmarks

Results of the benchmarks from this PR are compared with the results from 1.x branch.

Extractors ```shell +-----------------------+-------------------+------+-----+------------------+------------------+-----------------+ | benchmark | subject | revs | its | mem_peak | mode | rstdev | +-----------------------+-------------------+------+-----+------------------+------------------+-----------------+ | AvroExtractorBench | bench_extract_10k | 1 | 3 | 35.280mb +0.01% | 825.295ms +0.66% | ±1.79% +322.27% | | CSVExtractorBench | bench_extract_10k | 1 | 3 | 5.003mb +0.05% | 340.413ms -0.46% | ±0.36% -79.27% | | JsonExtractorBench | bench_extract_10k | 1 | 3 | 5.153mb +0.05% | 1.054s +0.89% | ±0.87% -6.79% | | ParquetExtractorBench | bench_extract_10k | 1 | 3 | 135.828mb +0.00% | 897.988ms +0.04% | ±0.90% -13.10% | | TextExtractorBench | bench_extract_10k | 1 | 3 | 4.910mb +0.05% | 35.667ms +1.63% | ±0.58% -31.80% | | XmlExtractorBench | bench_extract_10k | 1 | 3 | 4.916mb +0.05% | 431.594ms -0.43% | ±0.13% -93.35% | +-----------------------+-------------------+------+-----+------------------+------------------+-----------------+ ```
Transformers ```shell +-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+ | benchmark | subject | revs | its | mem_peak | mode | rstdev | +-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+ | RenameEntryTransformerBench | bench_transform_10k_rows | 1 | 3 | 110.617mb +0.00% | 64.789ms +2.81% | ±1.83% +22.72% | +-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+ ```
Loaders ```shell +--------------------+----------------+------+-----+------------------+------------------+-----------------+ | benchmark | subject | revs | its | mem_peak | mode | rstdev | +--------------------+----------------+------+-----+------------------+------------------+-----------------+ | AvroLoaderBench | bench_load_10k | 1 | 3 | 95.660mb +0.00% | 466.758ms +0.66% | ±0.62% -4.44% | | CSVLoaderBench | bench_load_10k | 1 | 3 | 54.142mb +0.00% | 71.865ms +0.78% | ±0.69% -42.49% | | JsonLoaderBench | bench_load_10k | 1 | 3 | 106.568mb +0.00% | 53.708ms +1.81% | ±1.18% +74.60% | | ParquetLoaderBench | bench_load_10k | 1 | 3 | 224.390mb +0.00% | 1.427s +0.75% | ±0.33% -28.27% | | TextLoaderBench | bench_load_10k | 1 | 3 | 17.957mb +0.01% | 40.234ms +2.49% | ±2.26% +211.58% | +--------------------+----------------+------+-----+------------------+------------------+-----------------+ ```
Building Blocks ```shell +-------------------------+----------------------------+------+-----+------------------+------------------+-----------------+ | benchmark | subject | revs | its | mem_peak | mode | rstdev | +-------------------------+----------------------------+------+-----+------------------+------------------+-----------------+ | RowsBench | bench_chunk_10_on_10k | 2 | 3 | 76.683mb +0.00% | 3.578ms +8.46% | ±2.44% +60.25% | | RowsBench | bench_diff_left_1k_on_10k | 2 | 3 | 96.410mb +0.00% | 182.663ms -0.11% | ±0.25% -62.09% | | RowsBench | bench_diff_right_1k_on_10k | 2 | 3 | 74.936mb +0.00% | 18.484ms -0.60% | ±0.54% -64.54% | | RowsBench | bench_drop_1k_on_10k | 2 | 3 | 77.923mb +0.00% | 1.838ms +18.24% | ±1.14% -48.74% | | RowsBench | bench_drop_right_1k_on_10k | 2 | 3 | 77.923mb +0.00% | 1.876ms +21.24% | ±3.49% +42.47% | | RowsBench | bench_entries_on_10k | 2 | 3 | 75.035mb +0.00% | 3.087ms +23.74% | ±3.05% +114.62% | | RowsBench | bench_filter_on_10k | 2 | 3 | 75.564mb +0.00% | 15.475ms +3.30% | ±1.01% -67.66% | | RowsBench | bench_find_on_10k | 2 | 3 | 75.564mb +0.00% | 15.109ms +1.55% | ±1.23% +422.99% | | RowsBench | bench_find_one_on_10k | 10 | 3 | 73.468mb +0.00% | 1.994μs +16.87% | ±2.40% -11.86% | | RowsBench | bench_first_on_10k | 10 | 3 | 73.468mb +0.00% | 0.400μs 0.00% | ±0.00% 0.00% | | RowsBench | bench_flat_map_on_1k | 2 | 3 | 87.023mb +0.00% | 13.015ms +1.64% | ±0.86% +58.29% | | RowsBench | bench_map_on_10k | 2 | 3 | 116.384mb +0.00% | 66.999ms +3.87% | ±3.23% +131.11% | | RowsBench | bench_merge_1k_on_10k | 2 | 3 | 76.084mb +0.00% | 1.328ms +17.58% | ±3.27% +331.16% | | RowsBench | bench_partition_by_on_10k | 2 | 3 | 79.431mb +0.00% | 58.061ms +1.01% | ±1.36% +73.90% | | RowsBench | bench_remove_on_10k | 2 | 3 | 78.185mb +0.00% | 3.917ms +3.19% | ±3.36% +165.13% | | RowsBench | bench_sort_asc_on_1k | 2 | 3 | 73.546mb +0.00% | 40.598ms -0.56% | ±1.84% -6.93% | | RowsBench | bench_sort_by_on_1k | 2 | 3 | 73.547mb +0.00% | 39.941ms -0.73% | ±0.19% -42.42% | | RowsBench | bench_sort_desc_on_1k | 2 | 3 | 73.546mb +0.00% | 40.295ms -0.76% | ±0.82% -33.39% | | RowsBench | bench_sort_entries_on_1k | 2 | 3 | 75.910mb +0.00% | 7.365ms +0.54% | ±0.64% -55.59% | | RowsBench | bench_sort_on_1k | 2 | 3 | 73.468mb +0.00% | 29.045ms +0.13% | ±1.75% +61.61% | | RowsBench | bench_take_1k_on_10k | 10 | 3 | 73.468mb +0.00% | 13.300μs +0.14% | ±0.61% -42.70% | | RowsBench | bench_take_right_1k_on_10k | 10 | 3 | 73.468mb +0.00% | 16.060μs +1.64% | ±2.09% +305.22% | | RowsBench | bench_unique_on_1k | 2 | 3 | 96.477mb +0.00% | 184.570ms -0.98% | ±0.53% -37.55% | | NativeEntryFactoryBench | bench_entry_factory | 1 | 3 | 116.715mb +0.00% | 483.970ms -1.30% | ±1.15% +158.10% | | NativeEntryFactoryBench | bench_entry_factory | 1 | 3 | 60.193mb +0.00% | 249.509ms -0.73% | ±0.76% -23.09% | | NativeEntryFactoryBench | bench_entry_factory | 1 | 3 | 15.128mb +0.02% | 53.014ms +3.79% | ±1.35% +114.19% | | TypeDetectorBench | bench_type_detector | 1 | 3 | 59.959mb +0.00% | 435.690ms +2.15% | ±0.41% +98.14% | | TypeDetectorBench | bench_type_detector | 1 | 3 | 14.498mb +0.02% | 86.116ms +1.31% | ±1.16% +983.98% | +-------------------------+----------------------------+------+-----+------------------+------------------+-----------------+ ```