how to access Apache Tika's recursiveJSON object using python-tika?

chrismattmann / tika-python

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.

Apache License 2.0

1.51k stars 235 forks source link

I'm using Apache Tika to OCR a bunch of PDFs. When I use the GUI (by doing java -jar tika-app-1.22.jar) everything works fine: I go to "Recursive JSON" on the "View" menu and the text is all there (even though nothing appears on "Main Content"). But when I use the Python wrapper I don't see any option to extract any "Recursive JSON" objects; and print(parsed['content']) returns an empty string. (Though print(parsed['metadata']) returns the metadata correctly. But I need the content.) What am I missing?

chrismattmann / tika-python

how to access Apache Tika's recursiveJSON object using python-tika? #362