flow-php / flow

Flow PHP - data processing framework
https://flow-php.com
MIT License
491 stars 28 forks source link

Return DOMDocument instead of DOMElement from XMLParserExtractor #1222

Closed norberttech closed 2 months ago

norberttech commented 2 months ago

Change Log

Added

Fixed

Changed

  • Return DOMDocument instead of DOMElement from XMLParserExtractor

Removed

Deprecated

Security


Description

The reason behind this change is strictly related to performance. When extractors returns DOMElemenet, the most commonly used XPath scalar function (that is usually used at node entry of a row) needs to convert it back to DOMDocument which affects performance since it adds few more steps to the process. So in general returning DOMDocument as a node might not be the most logical approach (it feels more natural to return DOMElement) but it will hurt the performance and it's not worth it.

github-actions[bot] commented 2 months ago

Flow PHP - Benchmarks

Results of the benchmarks from this PR are compared with the results from 1.x branch.

Extractors ```shell +-----------------------+-------------------+------+-----+-----------------+-------------------+------------------+ | benchmark | subject | revs | its | mem_peak | mode | rstdev | +-----------------------+-------------------+------+-----+-----------------+-------------------+------------------+ | CSVExtractorBench | bench_extract_10k | 1 | 3 | 4.540mb +0.01% | 508.519ms -0.06% | ±0.36% -52.17% | | JsonExtractorBench | bench_extract_10k | 1 | 3 | 4.655mb +0.01% | 1.066s +0.89% | ±0.37% -57.56% | | ParquetExtractorBench | bench_extract_10k | 1 | 3 | 29.111mb +0.00% | 432.843ms +0.35% | ±1.41% -24.59% | | TextExtractorBench | bench_extract_10k | 1 | 3 | 4.297mb +0.01% | 34.062ms -1.34% | ±1.60% -22.17% | | XmlExtractorBench | bench_extract_10k | 1 | 3 | 4.287mb -0.19% | 673.495ms -10.79% | ±3.51% +3155.97% | +-----------------------+-------------------+------+-----+-----------------+-------------------+------------------+ ```
Transformers ```shell +-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+ | benchmark | subject | revs | its | mem_peak | mode | rstdev | +-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+ | RenameEntryTransformerBench | bench_transform_10k_rows | 1 | 3 | 116.573mb +0.00% | 59.032ms -1.18% | ±0.81% -11.62% | +-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+ ```
Loaders ```shell +--------------------+----------------+------+-----+------------------+------------------+-----------------+ | benchmark | subject | revs | its | mem_peak | mode | rstdev | +--------------------+----------------+------+-----+------------------+------------------+-----------------+ | CSVLoaderBench | bench_load_10k | 1 | 3 | 54.738mb +0.00% | 142.287ms +2.41% | ±0.37% -66.41% | | JsonLoaderBench | bench_load_10k | 1 | 3 | 90.347mb +0.00% | 117.788ms +0.99% | ±1.23% +171.06% | | ParquetLoaderBench | bench_load_10k | 1 | 3 | 124.466mb +0.00% | 1.257s +2.32% | ±0.34% -82.08% | | TextLoaderBench | bench_load_10k | 1 | 3 | 17.488mb +0.00% | 44.589ms +1.72% | ±0.39% -72.66% | +--------------------+----------------+------+-----+------------------+------------------+-----------------+ ```
Building Blocks ```shell +-------------------------+----------------------------+------+-----+------------------+------------------+-----------------+ | benchmark | subject | revs | its | mem_peak | mode | rstdev | +-------------------------+----------------------------+------+-----+------------------+------------------+-----------------+ | NativeEntryFactoryBench | bench_entry_factory | 1 | 3 | 107.416mb +0.00% | 470.969ms -1.62% | ±0.56% -77.44% | | NativeEntryFactoryBench | bench_entry_factory | 1 | 3 | 55.774mb +0.00% | 238.154ms +1.81% | ±0.98% -70.00% | | NativeEntryFactoryBench | bench_entry_factory | 1 | 3 | 14.612mb +0.00% | 54.193ms +7.25% | ±3.51% +212.02% | | RowsBench | bench_chunk_10_on_10k | 2 | 3 | 87.329mb +0.00% | 3.777ms +3.56% | ±1.40% -55.79% | | RowsBench | bench_diff_left_1k_on_10k | 2 | 3 | 102.933mb +0.00% | 188.672ms +0.04% | ±0.42% +78.85% | | RowsBench | bench_diff_right_1k_on_10k | 2 | 3 | 85.653mb +0.00% | 19.146ms +0.33% | ±0.82% -27.67% | | RowsBench | bench_drop_1k_on_10k | 2 | 3 | 88.569mb +0.00% | 1.907ms +12.74% | ±2.05% +180.67% | | RowsBench | bench_drop_right_1k_on_10k | 2 | 3 | 88.569mb +0.00% | 1.846ms +7.02% | ±2.63% +189.58% | | RowsBench | bench_entries_on_10k | 2 | 3 | 85.681mb +0.00% | 3.226ms +13.77% | ±2.18% +129.06% | | RowsBench | bench_filter_on_10k | 2 | 3 | 86.210mb +0.00% | 17.058ms +3.86% | ±1.49% +69.68% | | RowsBench | bench_find_on_10k | 2 | 3 | 86.210mb +0.00% | 17.375ms +4.23% | ±2.53% +626.76% | | RowsBench | bench_find_one_on_10k | 10 | 3 | 84.114mb +0.00% | 1.906μs +12.52% | ±2.44% -13.79% | | RowsBench | bench_first_on_10k | 10 | 3 | 84.114mb +0.00% | 0.400μs +33.33% | ±0.00% +0.00% | | RowsBench | bench_flat_map_on_1k | 2 | 3 | 93.464mb +0.00% | 13.218ms +6.47% | ±0.85% -1.78% | | RowsBench | bench_map_on_10k | 2 | 3 | 122.835mb +0.00% | 62.667ms +1.45% | ±1.54% +164.06% | | RowsBench | bench_merge_1k_on_10k | 2 | 3 | 86.730mb +0.00% | 1.802ms +21.30% | ±2.83% +153.91% | | RowsBench | bench_partition_by_on_10k | 2 | 3 | 90.086mb +0.00% | 60.178ms +1.75% | ±0.58% +2.54% | | RowsBench | bench_remove_on_10k | 2 | 3 | 88.832mb +0.00% | 4.735ms +11.55% | ±2.83% -11.51% | | RowsBench | bench_sort_asc_on_1k | 2 | 3 | 84.264mb +0.00% | 40.636ms +2.91% | ±1.59% +38.19% | | RowsBench | bench_sort_by_on_1k | 2 | 3 | 84.265mb +0.00% | 41.442ms +6.01% | ±1.16% -16.90% | | RowsBench | bench_sort_desc_on_1k | 2 | 3 | 84.264mb +0.00% | 41.237ms +6.67% | ±0.93% -20.12% | | RowsBench | bench_sort_entries_on_1k | 2 | 3 | 86.556mb +0.00% | 7.431ms +0.50% | ±1.56% +61.84% | | RowsBench | bench_sort_on_1k | 2 | 3 | 84.114mb +0.00% | 28.836ms +0.55% | ±0.63% -42.80% | | RowsBench | bench_take_1k_on_10k | 10 | 3 | 84.114mb +0.00% | 13.807μs +2.88% | ±2.27% +77.87% | | RowsBench | bench_take_right_1k_on_10k | 10 | 3 | 84.114mb +0.00% | 16.662μs +1.53% | ±2.27% +40.54% | | RowsBench | bench_unique_on_1k | 2 | 3 | 102.934mb +0.00% | 192.158ms +0.59% | ±0.27% -46.31% | | TypeDetectorBench | bench_type_detector | 1 | 3 | 53.219mb +0.00% | 424.659ms +9.22% | ±0.34% +48.58% | | TypeDetectorBench | bench_type_detector | 1 | 3 | 13.485mb +0.00% | 79.858ms -4.77% | ±3.37% +71.33% | +-------------------------+----------------------------+------+-----+------------------+------------------+-----------------+ ```