-
The library used by Tika already spots Welsh, but needs to be [taught](https://github.com/optimaize/language-detector#how-you-can-help) to spot [Scots Gaelic (gd)](https://en.wikipedia.org/wiki/Scotti…
-
# npm audit report
got
-
Yomu is great. I'm currently using it to process thousands of documents. Unfortunately, this is very slow, because, right now, Yomu starts the JVM for each document. This takes about 2 seconds per doc…
-
# Versioning does not update tika metaset
## Symptoms
* When content is modified and the object is checked in as the same version, the tika metaset gets updated correctly.
* When a new version is c…
-
Hi, I am getting the following error when trying to run the docker file
``` bash
(dong) [conda] [lh599@corfu:nlm-ingestor]$ docker pull ghcr.io/nlmatics/nlm-ingestor:latest
Cannot connect to the Do…
-
Might be good to create a simple guide that installs apache tika in a solr service with build steps
pirog updated
2 years ago
-
File upload currently only allows text of pdf, we could use our tika parser to enable other upload types.
As conversion would be done on the server, this would require adding a simple entrypoint to c…
-
In `ocd_backend.utils.file_parser` we use the python version of Apache Tika as a fallback when the mimetype is not 'application/pdf'. We use `pdfparser.poppler` as first choice since it has a native b…
-
The hope here is to get TikaOnDotNet fully configured to access Tesseract OCR for text extraction from images. With Tika .93 support for Tesseract was added, and we are now in the midst of validating…
-
On Tika we've gathered two quines with their creators' permissions. One is a zip file that when unzipped is exactly the same file; the other is a gz file with the same behavior.
I can't imagine DRO…