VIDA-NYU / ache

ACHE is a web crawler for domain-specific search.
http://ache.readthedocs.io
Apache License 2.0
449 stars 135 forks source link

Bump tika-parsers from 1.18 to 2.4.1 #285

Closed dependabot[bot] closed 2 years ago

dependabot[bot] commented 2 years ago

Bumps tika-parsers from 1.18 to 2.4.1.

Changelog

Sourced from tika-parsers's changelog.

Release 2.4.1 - 06/14/2022

  • Implement bulk upload in the OpenSearch emitter (TIKA-3791).

  • Implement tika-server client via pipes mode (TIKA-3790).

  • Custom embedded parsers and EmbeddedDocumentHandlers can now add metadata to the container file's metadata (TIKA-3789).

  • Record embedded file exceptions in the container file's metadata (TIKA-3788).

  • Allow continuation of parsing after write limit has been reached (TIKA-3787).

  • Allow pass-through of 'Content-Length' header to metadata in TikaResource (TIKA-3786).

  • Add embedded depth to profiles tables in tika-eval (TIKA-3775).

  • Add stop() method to TikaServerCli so that it can be run with Apache Commons Daemon (TIKA-1570).

  • Fixed bug in ordering of Parsers during service loading (TIKA-3750).

  • Users can expand system properties from the forking process into forked tika-server processes (TIKA-3748).

  • Fix a few files being wrongly detected as EML (TIKA-3771).

  • Fix ignoreCharsets param of Icu4jEncodingDetector (TIKA-3774).

Release 2.4.0 - 04/23/2022

  • NOTE: To save on resources, we no longer include the deeplearning4j dependencies in the tika-dl jar. The dependencies for the tika-dl package must be provided by users. See: https://github.com/apache/tika/blob/main/tika-parsers/tika-parsers-ml/tika-dl/pom.xml for the dependencies that must be provided at run-time (TIKA-3676).

  • NOTE: Added prefix "dwg-custom:" to DWG custom metadata properties (TIKA-3731).

  • Add initial, BETA-grade TLS encryption option for tika-server; configuration may change in future releases (TIKA-3719).

  • Allow specification of fetcherName and fetchKey via query parameters in request URI in tika-server (TIKA-3714).

  • Add basic parsers for WARC and WACZ in tika-parsers-standard (TIKA-3697).

... (truncated)

Commits
  • aa3bfef [maven-release-plugin] prepare release 2.4.1-rc1
  • 98e9cf8 prep for 2.4.1 rc1
  • 0ea6571 TIKA-3792 -- only apply the handler decorator once for legacy xhtml processin...
  • e1892af TIKA-3790 -- fix unit test. sorry.
  • 47de04f TIKA-3779 -- make sure to close the temp stream in PDFParser and clean up aft...
  • 7877f9b TIKA-3790 -- actually implement tika server client via pipes (not yet async)
  • 18ce798 TIKA-3791 -- implement bulk updates in OpenSearch emitter.
  • ac6fe5b TIKA-3751: remove more of netty, update zookeeper to latest
  • 330dff3 TIKA-3751: remove netty, update zookeeper somewhat
  • fae27ea TIKA-3751: Update netty
  • Additional commits viewable in compare view


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
aecio commented 2 years ago

@dependabot ignore this major version

dependabot[bot] commented 2 years ago

OK, I won't notify you about version 2.x.x again, unless you re-open this PR or update to a 2.x.x release yourself.