Norconex / importer

Norconex Importer is a Java library and command-line application meant to "parse" and "extract" content out of a file as plain text, whatever its format (HTML, PDF, Word, etc). In addition, it allows you to perform any manipulation on the extracted text before using it in your own service or application.
http://www.norconex.com/collectors/importer/
Apache License 2.0
33 stars 23 forks source link

Update dependency on Tika 1.27 to Tika 2.x #121

Open Dhanvanthri opened 1 year ago

Dhanvanthri commented 1 year ago

I mistakenly posted an issue on Collector about this problem; turns out that Collector is pulling in Importer as a transitive dependency which in turn pulls Tika 1.27;

My application relies on Tika 2.x; is there an interest in upgrading the dependency on Tika in Norconex Importer?

ohtwadi commented 1 year ago

I'm marking this as a Feature Request. We will look at upgrading Tika to the latest version which is 2.5 at this time. Thank you.

ejschoen commented 1 month ago

Any thoughts as to whether this is feasible for either the 2.x or 3.x branches of norconex importer? There is at least one severe enough inherited vulnerability in even the last of the Tika 1.x branch versions that technically prevent us from deploying to customers without a bunch of additional analysis and paperwork.