TYPO3-Solr / ext-tika

A TYPO3 CMS extension that provides Apache Tika functionality
GNU General Public License v3.0
6 stars 29 forks source link

[TASK] Support for Apache Tika 2+ #218

Open dkd-kaehm opened 9 months ago

dkd-kaehm commented 9 months ago

What should be done in the scope of this task? We should provide compatibility to Apache Tika 2+

cyberelk commented 4 months ago

What is the current status of this issue?

We have noticed that on a TYPO3 V11 installation with Tika app version 1.27, minor problems occur when reading the file contents: From time to time the Tika processes hang for no apparent reason and sporadically warnings are returned when reading out files (J2KImageReader not loaded. JPEG2000 files will not be processed. or org.xerial's sqlite-jdbc is not loaded.). The same occurs with TYPO3 V12 and Tika 1.28.

For test purposes, we have simply placed Tika version 2.9.2 next to it. The warnings no longer exist here. We were also unable to detect any hanging processes. However, the tests in this regard were also rather superficial.

So the question now arises: From your point of view, what are the current arguments against using the Tika app in version 2.9.2 productively for versions 11 and 12 of TYPO3?

Kind regards, Jari

dkd-kaehm commented 4 months ago

Apache Solr uses Tika 1.x. We can chose any version of TIKA if Apache Solr drops TIKA CELL. See: #180

dkd-kaehm commented 4 months ago

@cyberelk Are you using Tika App or Tika Server mode?

cyberelk commented 4 months ago

At the moment we are using the app Version.