-
Progress has been made on text extraction from PDF.
It would be good to integrate a process like the one of https://github.com/VikParuchuri/marker and https://github.com/VikParuchuri/surya.
That wo…
-
look at the pdf image grabber that Carolina mentioned
-
**Describe the bug**
I have a 6 page PDF containing tables within images. Llama parse extracts 2 of the 6 pages. Without any insight into why the other pages are missing.
Also when i parse a PDF t…
-
Hello,
I am reaching out regarding my recent experience with pymupdf4llm. I have a PDF file that was created from a PowerPoint presentation, and I am attempting to extract specific text elements from…
-
-
I have created pdf from its docx version in which sections and subsections were created by built in heading styles instead of numbering .It is not able to recognise few subsections inside sections
-
**Is your feature request related to a problem? Please describe.**
Image annotations are not fully flattened when exporting to PDF. For my use case, signing paperwork, this is a security concern.
…
-
Hello 👋
Now that many models support image input as part of the prompt, what do you think of `kor` having support for parsing data from images? I would love to try and put up a draft PR :)
The …
-
# Description
Testing out different Python PDF extraction libraries
# Outcome
Select PDF extraction service
-
### Description
Compiling a pdf with non-latin (in this case specifically devanagari) text in it can sometimes result is strange text encoding. This results in text that is not properly selectable. T…