Bark-us / henkei

MIT License
2 stars 0 forks source link

Unable to parse docx and pdf #1

Closed Jasmeet2011 closed 4 years ago

Jasmeet2011 commented 4 years ago

Hi, I am trying to read pdf/docx files but i keep getting this error

Exception in thread "main" org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.pkg.PackageParser@1b9a632 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:286) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)

however, when I tried streaming a file from web using this henkei = Henkei.new 'http://svn.apache.org/repos/asf/poi/trunk/test-data/document/sample.docx' text = henkei.text it worked! Can you pl tell where am i going wrong

brandonhilkert commented 4 years ago

This is purely a copy of https://github.com/abrom/henkei. I'd recommend asking there for clarification.