Closed melish closed 6 years ago
I experimented with the -enableUnsecureFeatures -enableFileUrl
flags in Tika to allow it to fetch the remote URL content itself. It works, but there are serious security downsides and it would also require opening Tika server's access to the internet.
Consequently, the remote URL tagging implementation in de4b243 will stream the remote content to a temporary file, then feed it to Tika.
curl -X POST -d "url=http://extwprlegs1.fao.org/docs/pdf/cay158894.pdf" http://localhost/extract/species/url
should fetch the pdf file, extract text and run the extraction (all synchronously)