tika-python Search Results

572 results
for tika-python

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

lebedov/python-pdfbox #18

Extracting order pre-definable?

Hi Guys, Just wondering for a pdf file, if the text extraction order can be defined? As pointed out [here](https://pdfbox.apache.org/2.0/faq.html#textorder), is there similar setting to adjust the …

luke4u updated 3 years ago
3
deeplearning4j/deeplearning4j #8339

Errors encountered building Tika-Master from the source

ubuntu 18.04; java: openjdk version "1.8.0_222"; maven: 3.6.0 The source codes are located at: https://github.com/apache/tika/archive/master.zip mvn clean install stopped due to the following e…

ipsmile updated 10 months ago
12
elastic/connectors #1369

[Gmail Connector] Email cleaning

### Problem Description Currently the connector is extracting the entire raw email and this is in most cases insufficient for correct usage, specially when adding ML pipelines: ![image](https://…

llermaly updated 8 months ago
2
dadoonet/fscrawler #767

Ability to split documents per page so one elasticsearch ent…

When indexing large documents you may hit limits not only on the indexing part, but also when doing searches. Splitting documents into one entry per page helps slice up large documents into bite-s…

jawiz updated 4 years ago
6
PaddlePaddle/PaddleNLP #5421

[Question]: RequestsDependencyWarning: urllib3 (1.26.15) or …

### 请提出你的问题 /Library/Python/3.9/site-packages/requests/__init__.py:102: RequestsDependencyWarning: urllib3 (1.26.15) or chardet (5.1.0)/charset_normalizer (3.1.0) doesn't match a supported version! …

MonCac updated 4 months ago
1
meramos/analyze_telegramgate #1

Install instructions for MacOS

Hello Maria, thank you for this very impressive work. I tried to run it in my Mac and I had a few install steps to overcome, which I documented here: I tried to submit a pull request but I got den…

pedrobmorales updated 5 years ago
6
agrynchuk/noodle-ng #46

Media metadata crawler

``` Add a metadata crawler for multimedia files which gathers information about files present in the db. ``` Original issue reported on code.google.com by `hbwint...@gmail.com` on 11 Nov 2011 at 6:2…

GoogleCodeExporter updated 9 years ago
1
chrismattmann/tika-python #384

portions of strings getting cut off with "..."

Hi, I've gotten tika to work great for a while parsing PDFs - but realised recently that paragraphs longer than 240 characters or so (including spaces) are getting cut off/truncated. Is there any way…

BCorbeek updated 1 year ago
6
nlmatics/nlm-ingestor #77

nlm-ingestor installation is failing due to lxml, pandas, xx…

Hi, nlm-ingestor seems promising one but i couldn't able to move forward with the installation issue. I got the "**_ERROR: Failed to build installable wheels for some pyproject.toml based projects…

kcmuthyala updated 1 month ago
5
NCEAS/open-science-codefest #21

Automated metadata extraction

**Organizational Page**: [AutoMeta](https://github.com/NCEAS/open-science-codefest/wiki/AutoMeta) **Category**: Coding **Title**: Automatically extract metadata of R dataframes **Proposed by**: Ted H…

emhart updated 10 years ago
13

上一页 1...1 2 3 4 5 6 7...58 下一页

572 results for tika-python

572 results
for tika-python