flow-php / flow

Flow PHP - data processing framework
https://flow-php.com
MIT License
404 stars 23 forks source link

Replace `XMLNodeEntry` with new `XMLElementEntry` #1068

Closed stloyd closed 1 month ago

stloyd commented 1 month ago

Change Log

Added

  • Add new XMLElementEntry

Fixed

Changed

Removed

  • Remove XMLNodeEntry

Deprecated

Security


Description

Using simply just DOMNode generates problems as we're unable to work with XML attributes, which are available on a class that extends DOMNode = DOMElement.

github-actions[bot] commented 1 month ago

Flow PHP - Benchmarks

Results of the benchmarks from this PR are compared with the results from 1.x branch.

Extractors ```shell +-----------------------+-------------------+------+-----+------------------+------------------+----------------+ | benchmark | subject | revs | its | mem_peak | mode | rstdev | +-----------------------+-------------------+------+-----+------------------+------------------+----------------+ | AvroExtractorBench | bench_extract_10k | 1 | 3 | 35.422mb -0.00% | 827.881ms -0.88% | ±0.88% -40.61% | | CSVExtractorBench | bench_extract_10k | 1 | 3 | 5.142mb -0.00% | 335.755ms -0.59% | ±0.21% -27.25% | | JsonExtractorBench | bench_extract_10k | 1 | 3 | 5.173mb -0.00% | 1.064s -0.40% | ±0.30% +1.37% | | ParquetExtractorBench | bench_extract_10k | 1 | 3 | 135.845mb -0.00% | 907.112ms -0.57% | ±0.27% -42.00% | | TextExtractorBench | bench_extract_10k | 1 | 3 | 4.933mb -0.00% | 36.220ms -0.31% | ±1.45% +22.81% | | XmlExtractorBench | bench_extract_10k | 1 | 3 | 4.939mb -0.00% | 434.069ms +0.06% | ±0.44% -77.02% | +-----------------------+-------------------+------+-----+------------------+------------------+----------------+ ```
Transformers ```shell +-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+ | benchmark | subject | revs | its | mem_peak | mode | rstdev | +-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+ | RenameEntryTransformerBench | bench_transform_10k_rows | 1 | 3 | 116.239mb -0.00% | 60.114ms -0.89% | ±0.12% -73.46% | +-----------------------------+--------------------------+------+-----+------------------+-----------------+----------------+ ```
Loaders ```shell +--------------------+----------------+------+-----+------------------+------------------+-----------------+ | benchmark | subject | revs | its | mem_peak | mode | rstdev | +--------------------+----------------+------+-----+------------------+------------------+-----------------+ | AvroLoaderBench | bench_load_10k | 1 | 3 | 96.807mb -0.00% | 455.370ms +1.86% | ±0.23% -40.62% | | CSVLoaderBench | bench_load_10k | 1 | 3 | 55.221mb -0.00% | 70.973ms +1.35% | ±2.11% +43.23% | | JsonLoaderBench | bench_load_10k | 1 | 3 | 107.593mb -0.00% | 51.935ms -4.07% | ±1.04% +186.07% | | ParquetLoaderBench | bench_load_10k | 1 | 3 | 227.013mb +0.00% | 1.418s -0.11% | ±0.35% -73.42% | | TextLoaderBench | bench_load_10k | 1 | 3 | 17.976mb -0.00% | 37.867ms -3.63% | ±0.42% -45.98% | +--------------------+----------------+------+-----+------------------+------------------+-----------------+ ```
Building Blocks ```shell +-------------------------+----------------------------+------+-----+------------------+------------------+-----------------+ | benchmark | subject | revs | its | mem_peak | mode | rstdev | +-------------------------+----------------------------+------+-----+------------------+------------------+-----------------+ | RowsBench | bench_chunk_10_on_10k | 2 | 3 | 87.060mb -0.00% | 3.293ms -7.62% | ±1.30% -40.42% | | RowsBench | bench_diff_left_1k_on_10k | 2 | 3 | 102.658mb -0.00% | 186.384ms -3.16% | ±0.58% -25.56% | | RowsBench | bench_diff_right_1k_on_10k | 2 | 3 | 85.378mb -0.00% | 18.791ms -1.74% | ±0.23% -13.01% | | RowsBench | bench_drop_1k_on_10k | 2 | 3 | 88.300mb -0.00% | 1.691ms -18.21% | ±1.15% +106.08% | | RowsBench | bench_drop_right_1k_on_10k | 2 | 3 | 88.300mb -0.00% | 1.692ms -13.16% | ±0.40% -83.61% | | RowsBench | bench_entries_on_10k | 2 | 3 | 85.412mb -0.00% | 2.486ms -10.59% | ±1.89% +41.20% | | RowsBench | bench_filter_on_10k | 2 | 3 | 85.941mb -0.00% | 15.729ms -4.69% | ±1.30% +45.36% | | RowsBench | bench_find_on_10k | 2 | 3 | 85.941mb -0.00% | 15.865ms -5.22% | ±1.20% +82.30% | | RowsBench | bench_find_one_on_10k | 10 | 3 | 83.845mb -0.00% | 1.600μs -19.76% | ±0.00% -100.00% | | RowsBench | bench_first_on_10k | 10 | 3 | 83.845mb -0.00% | 0.400μs 0.00% | ±0.00% 0.00% | | RowsBench | bench_flat_map_on_1k | 2 | 3 | 93.195mb -0.00% | 12.207ms -9.84% | ±1.40% +4.38% | | RowsBench | bench_map_on_10k | 2 | 3 | 122.566mb -0.00% | 60.924ms -2.27% | ±1.28% +376.36% | | RowsBench | bench_merge_1k_on_10k | 2 | 3 | 86.460mb -0.00% | 1.219ms -18.59% | ±2.41% -29.79% | | RowsBench | bench_partition_by_on_10k | 2 | 3 | 89.807mb -0.00% | 63.970ms -4.54% | ±1.44% +43.01% | | RowsBench | bench_remove_on_10k | 2 | 3 | 88.562mb -0.00% | 3.855ms -9.17% | ±0.39% -50.00% | | RowsBench | bench_sort_asc_on_1k | 2 | 3 | 83.988mb -0.00% | 38.933ms -1.69% | ±2.15% +78.38% | | RowsBench | bench_sort_by_on_1k | 2 | 3 | 83.989mb -0.00% | 39.210ms -1.91% | ±0.78% -70.27% | | RowsBench | bench_sort_desc_on_1k | 2 | 3 | 83.988mb -0.00% | 39.330ms -2.39% | ±1.43% +84.54% | | RowsBench | bench_sort_entries_on_1k | 2 | 3 | 86.286mb -0.00% | 7.322ms -1.70% | ±0.61% -19.46% | | RowsBench | bench_sort_on_1k | 2 | 3 | 83.845mb -0.00% | 28.319ms -2.01% | ±1.62% +488.15% | | RowsBench | bench_take_1k_on_10k | 10 | 3 | 83.845mb -0.00% | 13.079μs -5.43% | ±0.95% -43.73% | | RowsBench | bench_take_right_1k_on_10k | 10 | 3 | 83.845mb -0.00% | 15.666μs -5.38% | ±1.31% -35.22% | | RowsBench | bench_unique_on_1k | 2 | 3 | 102.659mb -0.00% | 191.602ms -0.60% | ±0.63% +7.05% | | NativeEntryFactoryBench | bench_entry_factory | 1 | 3 | 116.789mb -0.00% | 497.881ms -2.58% | ±1.83% +1.66% | | NativeEntryFactoryBench | bench_entry_factory | 1 | 3 | 60.267mb -0.00% | 246.408ms -1.86% | ±1.79% +27.77% | | NativeEntryFactoryBench | bench_entry_factory | 1 | 3 | 15.201mb -0.00% | 53.527ms -1.37% | ±0.28% -57.15% | | TypeDetectorBench | bench_type_detector | 1 | 3 | 59.974mb +0.00% | 463.252ms +4.71% | ±3.12% +138.36% | | TypeDetectorBench | bench_type_detector | 1 | 3 | 14.514mb +0.00% | 85.334ms -3.55% | ±0.99% -63.13% | +-------------------------+----------------------------+------+-----+------------------+------------------+-----------------+ ```
norberttech commented 1 month ago

I would simply remove DOMNodeEntry and replace it with DOMElementEntry, I fucked up a bit at the beginning not reading properly definition of DOMNode which can be anything, including the attribute, DOMElement is what I was actually thinking about but since it wasn't usable I don't think it's a huge BC Break.

About attributes, I think it should be handled out of the scope of this PR since it's more XMLWriter responsibility rather than DOMElementEntry to convert entries into elements/attributes.

\Flow\ETL\Function\XPath you might need to look into this one as well since it returns DOMNode which is later converted by the native entry factory into DOMNodeEntry but I think it should be pretty straightforward to cast it into DOMElement. Afaik it's the only scalar function that operates on XML.