Norconex / importer

Norconex Importer is a Java library and command-line application meant to "parse" and "extract" content out of a file as plain text, whatever its format (HTML, PDF, Word, etc). In addition, it allows you to perform any manipulation on the extracted text before using it in your own service or application.
http://www.norconex.com/collectors/importer/
Apache License 2.0
33 stars 23 forks source link

Bump tika.version from 1.18 to 1.22 in /norconex-importer #106

Closed dependabot[bot] closed 4 years ago

dependabot[bot] commented 4 years ago

Bumps tika.version from 1.18 to 1.22.

Updates tika-core from 1.18 to 1.22

Changelog *Sourced from [tika-core's changelog](https://github.com/apache/tika/blob/master/CHANGES.txt).* > Release 2.0.0 - ??? > BREAKING CHANGES in 2.0.0 > > * Remove deprecated Metadata keys/properties (TIKA-1974). > > Other changes > > Release 1.23 > > * NOTE: tika-server no longer returns 415 for file types for which there > is no parser. > > * Upgrade to POI 4.1.1 (TIKA-2851). > > * Upgrade to PDFBox 2.0.17 (TIKA-2951). > > * Ensure that the PDFParser respects custom configuration of Tesseract > from tika-config.xml via Eric Pugh (TIKA-2970). > > * Add parser for XLIFF v1.2 files (TIKA-2975). > > * Add mime type detection support for WebAssembly (TIKA-2894), > HEIF / HEIC images (TIKA-2942), Digilite FDF (TIKA-2988); > and xml-root detection for XFDF (TIKA-2990) and XDP (TIKA-2989). > > * Add an XLZ Parser (TIKA-2976). > > * Fix deadlock with ForkParser when InputStream throws IOException (TIKA-2892). > > Release 1.22 - 07/29/2019 > > * NOTE: Known regression: PDFBOX-4587 -- PDF passwords with codepoints > between 0xF000 and 0XF0000 will cause an exception. > > * Add parser for HWP v5 files via SooMyung Lee (soomyung) and > JinSup Kim (ddoleye) (TIKA-2909). > > * Fix order of closing streams to avoid "Failed to close temporary resource" > exception (TIKA-2908). > > * Improve AutoDetectReader performance by caching encoding > detector (TIKA-1568). > > * Prevent RTFParser from outputting illegal tag combinations (TIKA-2889). > > * Fix RereadableInputStream to release all resources (TIKA-2903). > > * Implement custom language identifier in the tika-eval module based on > OpenNLP's language detector; add 18 languages and add common words > lists for all 121 languages (TIKA-2790). > ... (truncated)
Commits - [`aa2a385`](https://github.com/apache/tika/commit/aa2a385a630c702d2fb9a6cb37e280229c97f85a) [maven-release-plugin] prepare release 1.22-rc4 - [`de0fca9`](https://github.com/apache/tika/commit/de0fca9688dcd505f5d409a75f3b043a12eacf0b) roll back for rc#4...update date - [`4db132e`](https://github.com/apache/tika/commit/4db132e98ad25f65ed1f47074457cdb6fcf43ff4) roll back for rc#4 - [`c5daaf4`](https://github.com/apache/tika/commit/c5daaf4277d277c2e3fc749071ec8b86579f7553) Merge remote-tracking branch 'origin/branch_1x' into branch_1x - [`357c163`](https://github.com/apache/tika/commit/357c163a76713a3dc519e28936b1f36c2d6ab0c6) include opennlp lang model in tika-eval during assembly - [`0f3790e`](https://github.com/apache/tika/commit/0f3790ebd17b8345480ef0f5e8552ed615a7f121) [maven-release-plugin] prepare for next development iteration - [`c23f47e`](https://github.com/apache/tika/commit/c23f47e0bc9960530af4c02ce8d0d372758f0e1b) [maven-release-plugin] prepare release 1.23-rc3 - [`c25b81d`](https://github.com/apache/tika/commit/c25b81d3c659dd371ff9fb090144f453102e789b) Merge remote-tracking branch 'origin/branch_1x' into branch_1x - [`fd40040`](https://github.com/apache/tika/commit/fd40040f1c02d66820fb8d82f0a60a34dd973d3d) roll back for rc#3, again... - [`950ee35`](https://github.com/apache/tika/commit/950ee35243b920692b9e9e355c157d129cade98c) [maven-release-plugin] prepare for next development iteration - Additional commits viewable in [compare view](https://github.com/apache/tika/compare/1.18...1.22)


Updates tika-parsers from 1.18 to 1.22

Changelog *Sourced from [tika-parsers's changelog](https://github.com/apache/tika/blob/master/CHANGES.txt).* > Release 2.0.0 - ??? > BREAKING CHANGES in 2.0.0 > > * Remove deprecated Metadata keys/properties (TIKA-1974). > > Other changes > > Release 1.23 > > * NOTE: tika-server no longer returns 415 for file types for which there > is no parser. > > * Upgrade to POI 4.1.1 (TIKA-2851). > > * Upgrade to PDFBox 2.0.17 (TIKA-2951). > > * Ensure that the PDFParser respects custom configuration of Tesseract > from tika-config.xml via Eric Pugh (TIKA-2970). > > * Add parser for XLIFF v1.2 files (TIKA-2975). > > * Add mime type detection support for WebAssembly (TIKA-2894), > HEIF / HEIC images (TIKA-2942), Digilite FDF (TIKA-2988); > and xml-root detection for XFDF (TIKA-2990) and XDP (TIKA-2989). > > * Add an XLZ Parser (TIKA-2976). > > * Fix deadlock with ForkParser when InputStream throws IOException (TIKA-2892). > > Release 1.22 - 07/29/2019 > > * NOTE: Known regression: PDFBOX-4587 -- PDF passwords with codepoints > between 0xF000 and 0XF0000 will cause an exception. > > * Add parser for HWP v5 files via SooMyung Lee (soomyung) and > JinSup Kim (ddoleye) (TIKA-2909). > > * Fix order of closing streams to avoid "Failed to close temporary resource" > exception (TIKA-2908). > > * Improve AutoDetectReader performance by caching encoding > detector (TIKA-1568). > > * Prevent RTFParser from outputting illegal tag combinations (TIKA-2889). > > * Fix RereadableInputStream to release all resources (TIKA-2903). > > * Implement custom language identifier in the tika-eval module based on > OpenNLP's language detector; add 18 languages and add common words > lists for all 121 languages (TIKA-2790). > ... (truncated)
Commits - [`aa2a385`](https://github.com/apache/tika/commit/aa2a385a630c702d2fb9a6cb37e280229c97f85a) [maven-release-plugin] prepare release 1.22-rc4 - [`de0fca9`](https://github.com/apache/tika/commit/de0fca9688dcd505f5d409a75f3b043a12eacf0b) roll back for rc#4...update date - [`4db132e`](https://github.com/apache/tika/commit/4db132e98ad25f65ed1f47074457cdb6fcf43ff4) roll back for rc#4 - [`c5daaf4`](https://github.com/apache/tika/commit/c5daaf4277d277c2e3fc749071ec8b86579f7553) Merge remote-tracking branch 'origin/branch_1x' into branch_1x - [`357c163`](https://github.com/apache/tika/commit/357c163a76713a3dc519e28936b1f36c2d6ab0c6) include opennlp lang model in tika-eval during assembly - [`0f3790e`](https://github.com/apache/tika/commit/0f3790ebd17b8345480ef0f5e8552ed615a7f121) [maven-release-plugin] prepare for next development iteration - [`c23f47e`](https://github.com/apache/tika/commit/c23f47e0bc9960530af4c02ce8d0d372758f0e1b) [maven-release-plugin] prepare release 1.23-rc3 - [`c25b81d`](https://github.com/apache/tika/commit/c25b81d3c659dd371ff9fb090144f453102e789b) Merge remote-tracking branch 'origin/branch_1x' into branch_1x - [`fd40040`](https://github.com/apache/tika/commit/fd40040f1c02d66820fb8d82f0a60a34dd973d3d) roll back for rc#3, again... - [`950ee35`](https://github.com/apache/tika/commit/950ee35243b920692b9e9e355c157d129cade98c) [maven-release-plugin] prepare for next development iteration - Additional commits viewable in [compare view](https://github.com/apache/tika/compare/1.18...1.22)


Updates tika-translate from 1.18 to 1.22

Changelog *Sourced from [tika-translate's changelog](https://github.com/apache/tika/blob/master/CHANGES.txt).* > Release 2.0.0 - ??? > BREAKING CHANGES in 2.0.0 > > * Remove deprecated Metadata keys/properties (TIKA-1974). > > Other changes > > Release 1.23 > > * NOTE: tika-server no longer returns 415 for file types for which there > is no parser. > > * Upgrade to POI 4.1.1 (TIKA-2851). > > * Upgrade to PDFBox 2.0.17 (TIKA-2951). > > * Ensure that the PDFParser respects custom configuration of Tesseract > from tika-config.xml via Eric Pugh (TIKA-2970). > > * Add parser for XLIFF v1.2 files (TIKA-2975). > > * Add mime type detection support for WebAssembly (TIKA-2894), > HEIF / HEIC images (TIKA-2942), Digilite FDF (TIKA-2988); > and xml-root detection for XFDF (TIKA-2990) and XDP (TIKA-2989). > > * Add an XLZ Parser (TIKA-2976). > > * Fix deadlock with ForkParser when InputStream throws IOException (TIKA-2892). > > Release 1.22 - 07/29/2019 > > * NOTE: Known regression: PDFBOX-4587 -- PDF passwords with codepoints > between 0xF000 and 0XF0000 will cause an exception. > > * Add parser for HWP v5 files via SooMyung Lee (soomyung) and > JinSup Kim (ddoleye) (TIKA-2909). > > * Fix order of closing streams to avoid "Failed to close temporary resource" > exception (TIKA-2908). > > * Improve AutoDetectReader performance by caching encoding > detector (TIKA-1568). > > * Prevent RTFParser from outputting illegal tag combinations (TIKA-2889). > > * Fix RereadableInputStream to release all resources (TIKA-2903). > > * Implement custom language identifier in the tika-eval module based on > OpenNLP's language detector; add 18 languages and add common words > lists for all 121 languages (TIKA-2790). > ... (truncated)
Commits - [`aa2a385`](https://github.com/apache/tika/commit/aa2a385a630c702d2fb9a6cb37e280229c97f85a) [maven-release-plugin] prepare release 1.22-rc4 - [`de0fca9`](https://github.com/apache/tika/commit/de0fca9688dcd505f5d409a75f3b043a12eacf0b) roll back for rc#4...update date - [`4db132e`](https://github.com/apache/tika/commit/4db132e98ad25f65ed1f47074457cdb6fcf43ff4) roll back for rc#4 - [`c5daaf4`](https://github.com/apache/tika/commit/c5daaf4277d277c2e3fc749071ec8b86579f7553) Merge remote-tracking branch 'origin/branch_1x' into branch_1x - [`357c163`](https://github.com/apache/tika/commit/357c163a76713a3dc519e28936b1f36c2d6ab0c6) include opennlp lang model in tika-eval during assembly - [`0f3790e`](https://github.com/apache/tika/commit/0f3790ebd17b8345480ef0f5e8552ed615a7f121) [maven-release-plugin] prepare for next development iteration - [`c23f47e`](https://github.com/apache/tika/commit/c23f47e0bc9960530af4c02ce8d0d372758f0e1b) [maven-release-plugin] prepare release 1.23-rc3 - [`c25b81d`](https://github.com/apache/tika/commit/c25b81d3c659dd371ff9fb090144f453102e789b) Merge remote-tracking branch 'origin/branch_1x' into branch_1x - [`fd40040`](https://github.com/apache/tika/commit/fd40040f1c02d66820fb8d82f0a60a34dd973d3d) roll back for rc#3, again... - [`950ee35`](https://github.com/apache/tika/commit/950ee35243b920692b9e9e355c157d129cade98c) [maven-release-plugin] prepare for next development iteration - Additional commits viewable in [compare view](https://github.com/apache/tika/compare/1.18...1.22)


Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot ignore this [patch|minor|major] version` will close this PR and stop Dependabot creating any more for this minor/major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/Norconex/importer/network/alerts).
dependabot[bot] commented 4 years ago

Superseded by #107.