-
On Tika we've gathered two quines with their creators' permissions. One is a zip file that when unzipped is exactly the same file; the other is a gz file with the same behavior.
I can't imagine DRO…
-
If there is a parameter in Tika for Tesseract custom OCR dictionary, add it like in OCR of PDF images.
-
From [**quanteda** issue #380](https://github.com/kbenoit/quanteda/issues/380):
> Apache Tika (https://tika.apache.org/) might be useful.
> The KNIME folks just added that to their text mining nod…
-
### Description
We decided not to upgrade tika to 2.x on the 7.17 line because it could cause too many changes for a minor release (see https://github.com/elastic/elasticsearch/pull/86015). But it lo…
-
For a list of words given below we want to create a CSV file with the following columns:
- word
- top 5 documents where information value for the word has increased between plain and xml Tika
- top 5 …
fako updated
2 months ago
-
Newbie here, so please pardon if I'm missing something:
I'm running the VM in Oracle Virtual Box under Windows 10 (all current versions).
I tried indexing a file (always a Microsoft Word docuem…
-
The very general rule used by Tika is to check if the file has some of all configured signatures. If some signature matches, use glob pattern definitions to refine the mimetype to some subtype, if def…
-
2023-01-13 13:18:48.416 [ApplicationServerQueuedThreadPool-44] INFO System.err - Caused by: java.lang.NoSuchFieldError: WORKBOOK_DIR_ENTRY_NAMES
2023-01-13 13:18:48.419 [ApplicationServerQueuedThread…
-
https://github.com/google/go-tika/blob/master/tika/server.go#L143
Need to be careful with server API changes. It would be OK to pass incompatible changes through this API.
https://wiki.apache.or…
-
ubuntu 18.04; java: openjdk version "1.8.0_222"; maven: 3.6.0
The source codes are located at: https://github.com/apache/tika/archive/master.zip
mvn clean install stopped due to the following e…