Enable counting and/or parsing of incremental updates in PDFs. This
is an experimental feature and may change in later releases (TIKA-4017).
Fixed bug that prevented the the loading of CompositeExternalParser in tika-app and
tika-server-standard. This parser will call exiftool and ffmpeg if those are installed, as was
the behavior in Tika 1.x. Exclude org.apache.tika.parser.external.CompositeExternalParser
if you do not want this behavior (TIKA-4022).
Removed the shading of tika-parsers-standard-module (TIKA-4038).
Enable optional extraction of file system metadata in FileSystemFetcher (TIKA-4035).
Allow pretty printing in FileSystemEmitter (TIKA-4034).
Add detection for and a new mime type for older postscript-based
Adobe Illustrator "application/illustrator+ps" files (TIKA-3971).
Add magic detection for canon raw file types: crw, cr2 and cr3 (TIKA-3991).
Add detection for ONIX message files (TIKA-4011).
Add detection and a parser for ActiveMime files (TIKA-3987).
Add extraction of rendition layout value and version from Epub (TIKA-4013).
Improve embedded file extraction from PDFs (TIKA-4012).
Improve metadata extraction from WARCs (TIKA-4018).
Update to PDFBox 2.0.28 (TIKA-4016).
Users may now avoid the ZeroByteFileException via a
setting on the AutoDetectParserConfig (TIKA-3976).
Fix bug in closing elements in the presence of elements
in RTF files (TIKA-3972).
Improve extraction of embedded file names in .docx (TIKA-3968).
Normalize author, title, subject and description to their Dublin Core
properties in the HTMLParser (TIKA-3963).
Release 2.7.0 - 1/31/2023
Add SVG detection for svg files that lack the xml header (TIKA-3308).
Migrate to a live fork of Universal Charset Detector (TIKA-3213).
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.
Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
Bumps tika-core from 2.7.0 to 2.8.0.
Changelog
Sourced from tika-core's changelog.
... (truncated)
Commits
656971f
[maven-release-plugin] prepare release 2.8.0-rc2fd27103
Update CHANGES.txt and rollback dev version for 2.8.0-rc2ef8c8ff
Remove shading oftika-parsers-standard-package
(#1130)6a93b54
Merge pull request #1127 from apache/dependabot/maven/test.containers.version...93d824a
Merge pull request #1128 from apache/dependabot/maven/com.google.cloud-google...49e5970
Bump google-cloud-storage from 2.22.1 to 2.22.24b6d797
Merge pull request #1129 from apache/dependabot/maven/aws.version-1.12.467fab540d
Bump aws.version from 1.12.466 to 1.12.467c12e825
Bump test.containers.version from 1.18.0 to 1.18.15323f9e
TIKA-4037 -- add detection for os2 bitmap arrays.Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting
@dependabot rebase
.Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)