-
### Feature Name
swarmauri_community/parsers/concrete/TabulaPDFParser.py
### Feature Description
Using Tabula, extract tables from PDF files
### Motivation
To enable parsing of pdf documents
###…
-
On large tables, header is skipped. Is there a way to disabled this behaviour? If no, how to add the header back please?
```
Invoking large table row guess! set TATRFormatConfig.force_large_table_…
-
In `nlmExtractTables`, we store the emission tables two times to the vector DB.
https://github.com/Klimatbyran/garbo/blob/649e8c4a1edc8adb04e2aeafff8681c08910194e/src/workers/nlmExtractTables.ts#L1…
-
First thank you for making this lib!
I'm unable to extract headers properly however and your help will be much appreciated. First data row is always considered as header in this example. Am I doing…
-
### Requested feature
-
Hello,
I’m encountering an issue when extracting tables containing merged rows. Specifically, when a cell spans multiple rows, the expected behavior is to assign it a `row_span` value greater than …
-
There's a long-standing practice in PDF that XMP Metadata streams should not be compressed, but there is no note to this effect. So this issue raises two questions:
1. Is it still considered best p…
-
### Search before asking
- [X] I had searched in the [issues](https://github.com/eosphoros-ai/DB-GPT/issues?q=is%3Aissue) and found no similar feature requirement.
### Description
1. At present, t…
-
Hi. I'm trying to get some kind of bounding box alignment between the PDF (text extraction) method below and PyMuPDF's bounding boxes.
The Img2TableImage module's bounding box is reasonably accurat…
-
I'm are using a custom Layout Parser model, which is registered and has text, title, table. ... as categories.
I am trying to use pdfplumber detector and textextractionservice.
Code :
```
…