LibrePDF / OpenPDF

OpenPDF is a free Java library for creating and editing PDF files, with a LGPL and MPL open source license. OpenPDF is based on a fork of iText. We welcome contributions from other developers. Please feel free to submit pull-requests and bugreports to this GitHub repository.
Other
3.64k stars 598 forks source link

Unable to parse HTML table with whitespace inside it #59

Closed imgx64 closed 6 years ago

imgx64 commented 7 years ago
Document doc1 = new Document();
doc1.open();
HtmlParser.parse(doc1, new StringReader("<table><tr><td>test</td></tr></table>")); // succeeds

Document doc2 = new Document();
doc2.open();
HtmlParser.parse(doc2, new StringReader("<table> <tr><td>test</td></tr></table>")); // fails

The last line throws this exception:

Exception in thread "main" java.lang.ClassCastException: com.lowagie.text.Table cannot be cast to com.lowagie.text.TextElementArray
    at com.lowagie.text.xml.SAXiTextHandler.handleStartingTags(SAXiTextHandler.java:229)
    at com.lowagie.text.html.SAXmyHtmlHandler.startElement(SAXmyHtmlHandler.java:206)
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(AbstractSAXParser.java:509)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanStartElement(XMLDocumentFragmentScannerImpl.java:1359)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2784)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:602)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:505)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:841)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:770)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:643)
    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:327)
    at com.lowagie.text.html.HtmlParser.go(HtmlParser.java:85)
    at com.lowagie.text.html.HtmlParser.parse(HtmlParser.java:190)
    at com.example.PDF.main(PDF.java:17)

pom.xml

<dependency>
    <groupId>com.github.librepdf</groupId>
    <artifactId>openpdf</artifactId>
    <version>1.0.5</version>
</dependency>
<dependency>
    <groupId>com.github.librepdf</groupId>
    <artifactId>pdf-html</artifactId>
    <version>1.0.5</version>
</dependency>
riccardo-noviello commented 6 years ago

Hello I fixed this and opened a PR https://github.com/LibrePDF/OpenPDF/pull/66

riccardo-noviello commented 6 years ago

PR has been merged, please close this issue