Closed bigtang5 closed 11 months ago
.vsdx failed too, stop at: extractor.getText(), not the same bug as .doc.
Are you sure you have a valid .doc file at that point? What happens if you try to parse the document fully?
After a closer look I found the issue, the shadow-jar did not properly have the service-files merged and thus text-extraction was not available fully for some file-types.
Should be fixed on master now.
try { InputStream file = this.getAssets().open("test.doc"); POITextExtractor extractor = ExtractorFactory.createExtractor(file); System.out.println(extractor.getText()); } catch (IOException e) { e.printStackTrace(); //go here!! }
shows: java.io.IOException: Your InputStream was neither an OLE2 stream, nor an OOXML stream or you haven't provide the poi-ooxml.jar and/or poi-scratchpad.jar in the classpath/modulepath - FileMagic: OLE2, providers: [org.apache.poi.ooxml.extractor.POIXMLExtractorFactory@aafc2f6]
.xls, .vsd also have this problem.