eaudeweb / percolator

Poor man's auto tagging based on exact matches, synonyms and common abbreviations
0 stars 0 forks source link

Add support for external URL's #4

Closed melish closed 6 years ago

melish commented 6 years ago

curl -X POST -d "url=http://extwprlegs1.fao.org/docs/pdf/cay158894.pdf" http://localhost/extract/species/url

should fetch the pdf file, extract text and run the extraction (all synchronously)

andrei-duhnea commented 6 years ago

I experimented with the -enableUnsecureFeatures -enableFileUrl flags in Tika to allow it to fetch the remote URL content itself. It works, but there are serious security downsides and it would also require opening Tika server's access to the internet. Consequently, the remote URL tagging implementation in de4b243 will stream the remote content to a temporary file, then feed it to Tika.