What steps will reproduce the problem?
Initiate a parse with a source containing an empty img tag ("<img>")
Example code:
String html = "<img>";
NekoHtmlParser cleaner = new NekoHtmlParser();
InputSource inputSource = new InputSource(new StringReader(html));
DomTreeBuilder handler = new DomTreeBuilder();
cleaner.parse(inputSource, handler);
What is the expected output? What do you see instead?
Expected output is that the parse completes successfully and populates the DomTreeBuilder.
Actual output is a NullPointerException in org.outerj.daisy.diff.html.dom.ImageNode.<init>
What version of the product are you using? On what operating system?
Daisy Diff 1.2 with Java 6 and Java 7 on Linux and Window.
Please provide any additional information below.
Adding a source attribute to the img tag causes it to parse correctly.
We are parsing user-supplied data, sometimes pasted from other applications, which is how we wound up with an img tag with no attributes.
Stack Trace:
java.lang.NullPointerException
at org.outerj.daisy.diff.html.dom.ImageNode.<init>(Unknown Source)
at org.outerj.daisy.diff.html.dom.DomTreeBuilder.endElement(Unknown Source)
at org.outerj.daisy.diff.helper.MergeCharacterEventsHandler.endElement(Unknown Source)
at org.outerj.daisy.diff.helper.NekoHtmlParser$RemoveNamespacesHandler.endElement(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
at org.apache.xerces.parsers.AbstractXMLDocumentParser.emptyElement(Unknown Source)
at org.cyberneko.html.filters.DefaultFilter.emptyElement(DefaultFilter.java:148)
at org.cyberneko.html.filters.NamespaceBinder.emptyElement(NamespaceBinder.java:302)
at org.cyberneko.html.HTMLTagBalancer.startElement(HTMLTagBalancer.java:617)
at org.cyberneko.html.HTMLScanner$ContentScanner.scanStartElement(HTMLScanner.java:2637)
at org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2012)
at org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:910)
at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:499)
at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:452)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.outerj.daisy.diff.helper.NekoHtmlParser.parse(Unknown Source)
Original issue reported on code.google.com by mejari on 31 Jan 2013 at 6:37
Original issue reported on code.google.com by
mejari
on 31 Jan 2013 at 6:37