-
Hi,
While PDF table extraction using camelot python, if there is bold text in PDF table, its coming multiple times in JSON object. Can not figure out why is this ?
Is there any parameter we can set…
-
Hi Eduard,
Thank you for creating such a powerful package!
I wonder if you plan to extend the PDF extraction functionality in `llm_message()` to automatically detect whether the PDF is multi-col…
-
Trying to extract tabular data (table is embedded as an image) from a PDF file. While I've managed to extract some data, there are consistent errors when the table is located at the bottom of the PDF.…
-
### Issue: Comparing GROBID and Docling for Parsing Scholarly Publications
#### **My Use Case**
We need to parse and extract all relevant information from (1000s) of scholarly publications, such…
-
Hi,
I use `pdfWriter = muhammara.createWriterToModify(localPdfPath,{modifiedFilePath:destPdfPath});` to create my pdfWriter so I can read and add an annotations. It worked perfectly until now, when…
-
Hi! Not sure if this is a bug or a feature, but I'd love to use the `ai_extraction` option to improve the handling of PDF documents. However, enabling this option overwrites the `local=True` option.
…
-
**Bug report**
I'm working on a PDF parsing project.
I have created an AI model that finds and extracts all the tables in a PDF. now I just need a way to get the raw text without layout and tables…
-
When processing a PDF file with hi_res in `unstructured-api`, an error occurs on HTML table generation (from `unstructured-inferece`):
```
2024-07-24T08:49:18.887448624Z File "/home/notebook-user/…
-
Dec 2023 - March 2024
Subtask 2.1:
- [x] #89
Subtask 2.2: Linking extractions
- [ ] Implement a model identified in Subtask 2.1 to link together extractions within document (e.g., equation to tab…
-
### Requested feature
Enhanced table extraction for complex table formats. Currently, Docling is able to identify the values correctly, but formatting is sometimes misaligned or unclear, especially i…