chrismattmann / tika-python

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
Apache License 2.0
1.51k stars 234 forks source link

Tika on Databricks #245

Closed Murthy582 closed 5 years ago

Murthy582 commented 5 years ago

I'm new to Tika and trying and trying to setup Tika on Databricks. Do I need to install both tika-python and tika server jar files on databricks cluster to make it work?

If so, how to change the Tika parser config to point to Tika server ran by databricks cluster?

chrismattmann commented 5 years ago

The tika-python library will download tika-server and start it for you. You should be all good!