-
Hi team,
We are trying to use the library to extract data from the pdf files but there are no spaces in between words and we cannot use that. Is there a way to fix this?
Regards
Mandeep
-
**Describe the bug**
When trying to overwrite the tika_url parameter in the TikaConverter file converter through a yaml pipeline configuration file via an environment variable, the parameter is not s…
-
Yuan's extracted feature vectors and predictions are here:
`mlia-fn:/home/yzhuang/jsre-examples/Merged/test-goldentity`
wkiri updated
3 years ago
-
The NLP scholar dataset only seems to have the titles and no other text with contents of the papers (i.e. abstracts). We want to complete the dataset by adding the abstracts for each paper.
- [x] Che…
-
Hi Team,
I have installed the python-tika-1.14 in Linux (Ubuntu 16.04) box running on cloud. While executing this code
>>>import tika
>>>from tika import parser
>>>parsed = parser.from_file('…
-
**Describe the bug**
pdfto text does not seem to be working from the GPU Docker
**Error message**
File "/home/user/rest_api/controller/router.py", line 3, in
…
-
Usually I prefer to run my python projects in a docker environment, and there I usually have paths that differ from my local paths (sublime is running locally), e.g.: `/apps/project1/repo1` `/home/US…
-
**Describe the bug**
When I click on "Make", I get a "Failed to load PDF file." error message
**To Reproduce**
Steps to reproduce the behavior:
1. Go to https://latexresu.me/generator/
2. Click…
-
Hi there,
Is it possible to not show INFO messages, such as:
```
2021-05-07 09:57:38,745 [MainThread ] [INFO ] Retrieving URL_HERE to /tmp/filestream.ashx.
```
I tried setting some env va…
-
Hey,
Following @jvanz 's request at #195, this is a issue to discuss parsers to extract information from the downloaded gazette files.
At that PR I made a parser prototype for a spider (PE - São…