HtmlUnit / htmlunit-neko

HtmlUnit adaptation of NekoHtml
Apache License 2.0
18 stars 15 forks source link

Lowercase property gets ignored in DOM parser #127

Open radkovo opened 1 week ago

radkovo commented 1 week ago

When the http://cyberneko.org/html/properties/names/elems is set to lower, the DOM parser still returns uppercase element names.

The following code reproduces the problem:

InputStream is = ...; // HTML input stream

DOMParser parser = new DOMParser(HTMLDocumentImpl.class);
parser.setProperty("http://cyberneko.org/html/properties/names/elems", "lower");
parser.setProperty("http://cyberneko.org/html/properties/names/attrs", "lower");

parser.parse(new org.xml.sax.InputSource(is));
doc = parser.getDocument();
// prints out 'lower'
System.out.println(parser.getXMLParserConfiguration().getProperty("http://cyberneko.org/html/properties/names/elems"));
// prints out 'HTML'
System.out.println(doc.getDocumentElement().getTagName());

The expected result (that worked with old nekohtml) is that the last line prints html instead of HTML.

rbri commented 1 week ago

@radkovo i fear you are right - have to write some tests and check what triggered the change.

Will make a new snapshot hopefully soon

rbri commented 1 week ago

@radkovo sorry for that stupid bug - was a regression of an major update we dis some versions ago. And at that time there was no real unit test for that part of the impl. The tests where added later and we used the wrong expectations. Hopefully now everything is fixed.

Please try the latest snapshot build 4.7.0-SNAPSHOT

radkovo commented 1 week ago

Many thanks for a quick response. It's much better, almost all my tests have passed now :-) However, it seems that the last change breaks the functionality of (at least) getElementsByTagName() because in HTMLDocumentImpl, the required tag name is converted to uppercase and later in the corresponding node list DeepNodeListImpl the uppercase name is compared with getTagName(), which always fails as the DOM tag names have been converted to lowercase.

rbri commented 1 week ago

Will add some more test and fix that. But maybe it has to wait for the weekend....

rbri commented 1 week ago

@radkovo snapshot updated again

radkovo commented 1 week ago

Perfect, all my test in jStyleParser have passed now. Well done, thank you very much.

rbri commented 1 week ago

@radkovo your welcome

a next release might be available at the beginning of december