-
We need to support PDF out of the box for ChatGPT / Claude
-
### Search before asking
- [X] I had searched in the [issues](https://github.com/eosphoros-ai/DB-GPT/issues?q=is%3Aissue) and found no similar issues.
### Operating system information
Linux
### P…
-
Error extracting text from document
## Environment
Which environment were you using when you encountered the problem?
```bash
$ python -m platform
Windows-11-10.0.22631-SP0
$ python -c "…
-
Add an example notebook that shows how to extract treatments from a PDF
-
Trying to extract text from one of the PDF led to an error in extracting text.
Additional info when error happened (see traceback later)
```
s.get_object() = {'/Filter': '/FlateDecode', '/Lengt…
-
### Search before asking
- [X] I searched the [issues](https://github.com/IBM/data-prep-lab/issues) and found no similar issues.
### Component
Tools/ingest2parquet
### What happened + What you ex…
sujee updated
3 weeks ago
-
Hi,
I want to run pdfplumber and doctr in the same pipeline.
That is if pdfplumber doesn't extract any data, then the results from doctr are used.
But if pdfplumber does extract data then doctr i…
-
### Description of the bug | 错误描述
在win11的docker 里安装后,运行magic-pdf -p /home/data/12_Malovichko.pdf -o /home/data/output -m auto,运行中cuda 出错。但是cuda 显示已经安装好了,不过nvcc -v出错了。
PS C:\Users\AQUANAUT> docke…
-
Develop a formatter to parse PDF and DOCX files, extract text and tables while handling complex layouts.
- [ ] Research methods of text extraction from PDF and DOCX.
- [ ] Implement Basic Parsing …
-
Hi.
I am nearing the MAG classification step for samples extracted from soil around plant roots, but the number of MAGS seems to only be around 4 or 5? The sequencing was completed on a NovaSeq, s…