Removed dangerous calls to read an inputstream or convert to bytes
without specifying a charset
Parsers can be configured via tika-config.xml on instantiation.
We have moved away from configuration via .properties files because
of confusion among users. This affects the PDFParser, TesseractOCRParser
and the StringsParser.
For those parsers that can be configured per parse via a config object
passed in through the ParseContext, the config object will only update those fields
that the user has modified. The config object will no longer
fully reset all settings to the default settings per parse.
This has a more intuitive "update the base/configured settings" with
what has been changed in the config object.
tika-parsers
The parser modules have been broken into three main modules:
tika-parsers-classic, tika-parsers-extended and tika-parsers-advanced.
Users may now need to add tika-parsers-extended to tika-app and
tika-server to include parsers that used to be included by default
(for example: envi, gdal, grib, isatab, netcdf).
ChmParser was moved to org.apache.tika.parser.microsoft.chm
RTFParser was moved to org.apache.tika.parser.microsoft.rtf
tika-app
tika-server
tika-server now by default forks a process to isolate the parsing
in the forked process (this was called the -spawnChild option
in tika-1.x). Clients must now expect that tika-server
will restart on OOM, timeouts, crashes or after parsing a
large number of files. When this happens tika-server will restand and not
receive connections for brief periods. The less robust, legacy behavior
of not forking a process is available with "-noFork"
tika-server's /metadata endpoint requires tika-server-classic to write XMP/rdf output.
This output is not available in tika-server-core.
Release 1.27 - ??
Apply encoding detection to zip entry names via Ryan421 (TIKA-3374).
Add json output for /tika endpoint in tika-server (TIKA-3352).
Tika's OpenNLPDetector now covers 148 languages and language-script pairs (TIKA-3340).
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.
Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
- `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language
- `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language
- `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language
- `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language
You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/adriens/person-counter-api/network/alerts).
Bumps tika from 1.24.1 to 1.25.
Changelog
Sourced from tika's changelog.
... (truncated)
Commits
0090eba
[maven-release-plugin] prepare release 1.25-rc2a464047
roll back for 1.25-rc2, update release date3dc1e8b
roll back for 1.25-rc265744e7
Updated CHANGES.txt with details on TIKA-3189 and TIKA-32271abf0eb
fix whitespace2e89e4c
update README.txt from main branchd4e607a
[maven-release-plugin] prepare for next development iteration760aa4a
[maven-release-plugin] prepare release 1.25-rc12744672
Fix license issues identified via rat check2775afe
Update README for 1.25 releaseDependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting
@dependabot rebase
.Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/adriens/person-counter-api/network/alerts).