text-extraction Search Results

olalha/Quizzer #2

Word text extraction

Create a function that extracts text from Word

olalha updated 1 month ago

Islandora/islandora #1064

[FEATURE] [BUG] Language selection for Tesseract/text extrac…

This is both a bug (language parameter not being passed to Tesseract when Tesseract has the ability to work in different languages) and a feature request (creating that behaviour in Islandora). **O…

bondjimbond updated 3 minutes ago

raycast/extensions #15526

[iLovePDF] extract etxt from pdf always includes Detailed Ex…

### Extension https://www.raycast.com/mohamedk1/ilovepdf ### Raycast Version 1.86.0 ### macOS Version 15.1.1 ### Description [iLovePDF] extract etxt from pdf always includes Detailed Extraction…

bigplayer-ai updated 1 day ago

ibm-granite-community/pm #126

Entity extraction and creation from unstructured text

adampingel updated 1 day ago

pdfminer/pdfminer.six #1056

Text extraction issue with extract_text_to_fp - Uncleaned CI…

Hello, While using the extract_text_to_fp function with the latest version of pdfminer.six, I've encountered an issue where CID characters (e.g., CID(123)) appear in the extracted text. These chara…

BaillySylvain updated 3 weeks ago

pymupdf/RAG #191

Extraction of text stops in the middle while working fine wi…

as example converting this PDF to markdown https://cache.industry.siemens.com/dl/files/702/109768702/att_998757/v4/109768702_UserAdministration_WinCC_V7.5_en.pdf results in: ##### 4.2.1 Configurati…

sebastiaanvduijn updated 4 days ago

edubruell/tidyllm #35

Enhancing PDF extraction: multi-column layout and OCR

Hi Eduard, Thank you for creating such a powerful package! I wonder if you plan to extend the PDF extraction functionality in `llm_message()` to automatically detect whether the PDF is multi-col…

JiaZhang42 updated 1 week ago

ckampfe/russ #28

HTML text extraction

### Is there an existing issue for this? - [X] I have searched the existing issues ### Feature description Some RSS feeds only include a small snippet of the article, or sometimes nothing at all. I…

mntn-xyz updated 2 months ago

yobix-ai/extractous #35

Support for Extracting PDF Content as XML

Hi, I’d like to use Extractous for my document processing tasks. I often need to extract PDF content as XML to retain structural information, such as page boundaries. This is a feature supported by Ap…

coroluca updated 1 day ago

anakib1/MangoTruth #12

PDF, DOCX formatter

Develop a formatter to parse PDF and DOCX files, extract text and tables while handling complex layouts. - [ ] Research methods of text extraction from PDF and DOCX. - [ ] Implement Basic Parsing …

Silence-o0 updated 1 week ago

1000+ results for text-extraction

1000+ results
for text-extraction