flow-php / flow

Flow PHP - data processing framework
https://flow-php.com
MIT License
404 stars 23 forks source link

Removed null type #1033

Closed norberttech closed 3 months ago

norberttech commented 3 months ago

Change Log

Added

Fixed

Changed

Removed

  • Removed null type

Deprecated

Security


Description

Null as a EntryType was always problematic, mostly due to simple fact that in most data sources/sinks it simply does not exists as a standalone type. For example in databases, we can have nullable columns but not null fields. Same with Parquet and Avro, we can have nullable strings/integeres/maps/etc but we don't have null column type.

That was creating a very inconsistent behavior when reading from those strongly typed data srouces. Reading a nullable boolean column from parquet where all values are null we would get NullEntry in Flow Schema, instead of getting BoolEntry that is nullable. That was similarly problematic when we wanted to write into parquet a nullable column with only null values in a given row group.

Another indication that this change was necessary is how well it worked with Entry Type while creating entries.

github-actions[bot] commented 3 months ago

Flow PHP - Benchmarks

Results of the benchmarks from this PR are compared with the results from 1.x branch.

Extractors ```shell +-----------------------+-------------------+------+-----+------------------+------------------+-----------------+ | benchmark | subject | revs | its | mem_peak | mode | rstdev | +-----------------------+-------------------+------+-----+------------------+------------------+-----------------+ | AvroExtractorBench | bench_extract_10k | 1 | 3 | 35.282mb +0.01% | 838.205ms +0.96% | ±1.63% -39.84% | | CSVExtractorBench | bench_extract_10k | 1 | 3 | 5.005mb +0.04% | 341.535ms -0.23% | ±0.52% +688.28% | | JsonExtractorBench | bench_extract_10k | 1 | 3 | 5.156mb +0.11% | 1.066s +2.13% | ±0.63% +140.39% | | ParquetExtractorBench | bench_extract_10k | 1 | 3 | 135.831mb -0.00% | 905.395ms +1.11% | ±0.91% -8.00% | | TextExtractorBench | bench_extract_10k | 1 | 3 | 4.913mb +0.09% | 35.967ms +1.51% | ±1.12% +96.31% | | XmlExtractorBench | bench_extract_10k | 1 | 3 | 4.919mb +0.09% | 433.765ms -1.68% | ±0.40% -61.67% | +-----------------------+-------------------+------+-----+------------------+------------------+-----------------+ ```
Transformers ```shell +-----------------------------+--------------------------+------+-----+------------------+-----------------+---------------+ | benchmark | subject | revs | its | mem_peak | mode | rstdev | +-----------------------------+--------------------------+------+-----+------------------+-----------------+---------------+ | RenameEntryTransformerBench | bench_transform_10k_rows | 1 | 3 | 110.619mb +5.07% | 60.784ms -7.74% | ±0.74% -5.95% | +-----------------------------+--------------------------+------+-----+------------------+-----------------+---------------+ ```
Loaders ```shell +--------------------+----------------+------+-----+------------------+------------------+-----------------+ | benchmark | subject | revs | its | mem_peak | mode | rstdev | +--------------------+----------------+------+-----+------------------+------------------+-----------------+ | AvroLoaderBench | bench_load_10k | 1 | 3 | 95.663mb +1.06% | 457.220ms -3.61% | ±0.54% -45.75% | | CSVLoaderBench | bench_load_10k | 1 | 3 | 54.144mb +1.85% | 68.352ms -6.42% | ±0.41% -45.81% | | JsonLoaderBench | bench_load_10k | 1 | 3 | 106.570mb +0.95% | 50.106ms -6.15% | ±1.21% +55.96% | | ParquetLoaderBench | bench_load_10k | 1 | 3 | 224.394mb +1.16% | 1.397s -3.24% | ±0.42% -23.12% | | TextLoaderBench | bench_load_10k | 1 | 3 | 17.960mb +0.03% | 39.183ms -5.39% | ±1.10% +825.95% | +--------------------+----------------+------+-----+------------------+------------------+-----------------+ ```
Building Blocks ```shell +-------------------------+----------------------------+------+-----+------------------+------------------+-----------------+ | benchmark | subject | revs | its | mem_peak | mode | rstdev | +-------------------------+----------------------------+------+-----+------------------+------------------+-----------------+ | RowsBench | bench_chunk_10_on_10k | 2 | 3 | 76.686mb +13.52% | 3.219ms -5.30% | ±1.43% +445.59% | | RowsBench | bench_diff_left_1k_on_10k | 2 | 3 | 96.413mb +6.47% | 187.699ms +0.22% | ±1.15% +217.23% | | RowsBench | bench_diff_right_1k_on_10k | 2 | 3 | 74.938mb +13.92% | 18.926ms -1.80% | ±0.69% -24.00% | | RowsBench | bench_drop_1k_on_10k | 2 | 3 | 77.926mb +13.30% | 1.654ms -10.28% | ±0.87% -76.41% | | RowsBench | bench_drop_right_1k_on_10k | 2 | 3 | 77.926mb +13.30% | 1.639ms -17.40% | ±0.71% -69.47% | | RowsBench | bench_entries_on_10k | 2 | 3 | 75.038mb +13.81% | 2.464ms -10.41% | ±2.81% +383.25% | | RowsBench | bench_filter_on_10k | 2 | 3 | 75.567mb +13.72% | 16.100ms +7.24% | ±0.87% +206.49% | | RowsBench | bench_find_on_10k | 2 | 3 | 75.567mb +13.72% | 16.326ms +6.85% | ±0.71% -48.81% | | RowsBench | bench_find_one_on_10k | 10 | 3 | 73.471mb +14.11% | 1.706μs -5.22% | ±2.72% +0.00% | | RowsBench | bench_first_on_10k | 10 | 3 | 73.471mb +14.11% | 0.400μs 0.00% | ±0.00% 0.00% | | RowsBench | bench_flat_map_on_1k | 2 | 3 | 87.025mb +7.08% | 12.717ms -2.75% | ±0.80% +340.88% | | RowsBench | bench_map_on_10k | 2 | 3 | 116.386mb +5.30% | 60.666ms -10.05% | ±0.66% -24.34% | | RowsBench | bench_merge_1k_on_10k | 2 | 3 | 76.086mb +13.62% | 1.193ms -19.06% | ±1.03% -62.47% | | RowsBench | bench_partition_by_on_10k | 2 | 3 | 79.433mb +13.05% | 62.706ms +4.58% | ±2.21% +925.90% | | RowsBench | bench_remove_on_10k | 2 | 3 | 78.188mb +13.26% | 3.792ms -7.16% | ±0.48% -42.91% | | RowsBench | bench_sort_asc_on_1k | 2 | 3 | 73.549mb +14.09% | 40.137ms -0.46% | ±0.98% +221.08% | | RowsBench | bench_sort_by_on_1k | 2 | 3 | 73.550mb +14.09% | 40.048ms -1.66% | ±0.65% +43.45% | | RowsBench | bench_sort_desc_on_1k | 2 | 3 | 73.549mb +14.09% | 40.227ms -1.84% | ±0.48% -23.46% | | RowsBench | bench_sort_entries_on_1k | 2 | 3 | 75.912mb +13.65% | 7.372ms -3.54% | ±0.88% +164.01% | | RowsBench | bench_sort_on_1k | 2 | 3 | 73.471mb +14.11% | 28.914ms -1.31% | ±1.46% -1.05% | | RowsBench | bench_take_1k_on_10k | 10 | 3 | 73.471mb +14.11% | 13.482μs -1.47% | ±1.06% -32.58% | | RowsBench | bench_take_right_1k_on_10k | 10 | 3 | 73.471mb +14.11% | 16.212μs +0.82% | ±1.63% +55.38% | | RowsBench | bench_unique_on_1k | 2 | 3 | 96.479mb +6.40% | 190.532ms -0.37% | ±0.90% +22.64% | | NativeEntryFactoryBench | bench_entry_factory | 1 | 3 | 116.718mb +0.01% | 507.385ms +4.74% | ±0.22% -69.62% | | NativeEntryFactoryBench | bench_entry_factory | 1 | 3 | 60.196mb +0.02% | 255.960ms +3.95% | ±0.29% -43.63% | | NativeEntryFactoryBench | bench_entry_factory | 1 | 3 | 15.130mb +0.06% | 54.362ms +3.22% | ±0.94% -24.98% | | TypeDetectorBench | bench_type_detector | 1 | 3 | 59.961mb -0.00% | 428.140ms -1.43% | ±0.34% +46.03% | | TypeDetectorBench | bench_type_detector | 1 | 3 | 14.500mb -0.00% | 85.088ms -2.54% | ±0.56% +563.11% | +-------------------------+----------------------------+------+-----+------------------+------------------+-----------------+ ```