Norconex / importer

Norconex Importer is a Java library and command-line application meant to "parse" and "extract" content out of a file as plain text, whatever its format (HTML, PDF, Word, etc). In addition, it allows you to perform any manipulation on the extracted text before using it in your own service or application.
http://www.norconex.com/collectors/importer/
Apache License 2.0
33 stars 23 forks source link

Sort detected languages in descending order #82

Closed rustyx closed 6 years ago

rustyx commented 6 years ago

Fixes https://github.com/Norconex/collector-http/issues/520