centic9 / poi-on-android

A sample project that shows how Apache POI can be used in an Android application
Apache License 2.0
357 stars 89 forks source link

POITextExtractor bug! docs, xlsx, pptx are OK, but doc file throws Exception! #98

Closed bigtang5 closed 11 months ago

bigtang5 commented 2 years ago

try { InputStream file = this.getAssets().open("test.doc"); POITextExtractor extractor = ExtractorFactory.createExtractor(file); System.out.println(extractor.getText()); } catch (IOException e) { e.printStackTrace(); //go here!! }

shows: java.io.IOException: Your InputStream was neither an OLE2 stream, nor an OOXML stream or you haven't provide the poi-ooxml.jar and/or poi-scratchpad.jar in the classpath/modulepath - FileMagic: OLE2, providers: [org.apache.poi.ooxml.extractor.POIXMLExtractorFactory@aafc2f6]

.xls, .vsd also have this problem.

bigtang5 commented 2 years ago

.vsdx failed too, stop at: extractor.getText(), not the same bug as .doc.

centic9 commented 11 months ago

Are you sure you have a valid .doc file at that point? What happens if you try to parse the document fully?

centic9 commented 11 months ago

After a closer look I found the issue, the shadow-jar did not properly have the service-files merged and thus text-extraction was not available fully for some file-types.

Should be fixed on master now.