OCA / openupgradelib

A library with support functions to be called from Odoo migration scripts.
GNU Affero General Public License v3.0
85 stars 171 forks source link

[IMP] convert_html_fragment: prevent lxml wraps and false positives #349

Closed chienandalu closed 10 months ago

chienandalu commented 10 months ago

When the fragment has no common node, lxml wraps the code under a common one. This content for instance:

<p><p/><p></p>

is parsed as:

<div><p><p/><p></p></div>

To avoid this we force a custom wrapper tag on every parsed string so every xml receives the same treatment and we can extract it later with no harm.

We don't want to update any fragment which has no changes after all the replacements are checked. For that cases we'll return the original string.

At the end, we just trim our initial custom wrapper tag and return our treated string.

I made this test case and tested it in a v16 shell:

from openupgradelib.openupgrade_160 import convert_string_bootstrap_4to5

# test_regular_no_change
html_string = "<div><p></p><p></p></div>"
html_string_parsed = convert_string_bootstrap_4to5(html_string)
assert html_string == html_string_parsed

# test_multi_root_no_change
html_string = "<p></p><p></p>"
html_string_parsed = convert_string_bootstrap_4to5(html_string)
assert html_string == html_string_parsed

# test_regular_with_change
html_string = "<div><p data-html='test'></p><p data-display='test'></p></div>"
expected_output = '<div><p data-bs-html="test"/><p data-bs-display="test"/></div>'
# Turn off pretty-print for easier comparison
html_string_parsed = convert_string_bootstrap_4to5(html_string, pretty_print=False)
assert expected_output == html_string_parsed

# test_multi_root_with_change
html_string = "<p data-html='test'></p><p data-display='test'></p>"
expected_output = '<p data-bs-html="test"/><p data-bs-display="test"/>'
# Turn off pretty-print for easier comparison
html_string_parsed = convert_string_bootstrap_4to5(html_string, pretty_print=False)
assert expected_output == html_string_parsed

cc @Tecnativa TT44169 TT45734

please take a look @pedrobaeza