-
### Search before asking
- [X] I had searched in the [issues](https://github.com/eosphoros-ai/DB-GPT/issues?q=is%3Aissue) and found no similar issues.
### Operating system information
Linux
### P…
-
Trying to extract text from one of the PDF led to an error in extracting text.
Additional info when error happened (see traceback later)
```
s.get_object() = {'/Filter': '/FlateDecode', '/Lengt…
-
Add an example notebook that shows how to extract treatments from a PDF
-
I am working on a project that involves providing LightRAG with hundreds of PDFs for queries. I want to ensure that the data is processed efficiently and accurately.
1. What is the optimal format fo…
-
### Search before asking
- [X] I searched the [issues](https://github.com/IBM/data-prep-lab/issues) and found no similar issues.
### Component
Tools/ingest2parquet
### What happened + What you ex…
sujee updated
2 weeks ago
-
### Description of the bug | 错误描述
在win11的docker 里安装后,运行magic-pdf -p /home/data/12_Malovichko.pdf -o /home/data/output -m auto,运行中cuda 出错。但是cuda 显示已经安装好了,不过nvcc -v出错了。
PS C:\Users\AQUANAUT> docke…
-
Develop a formatter to parse PDF and DOCX files, extract text and tables while handling complex layouts.
- [ ] Research methods of text extraction from PDF and DOCX.
- [ ] Implement Basic Parsing …
-
```
[](https://localhost:8080/#) in extract_data_from_pdf(pdf_path)
57 # Function to extract text using the unstructured library
58 def extract_data_from_pdf(pdf_path):
---> 59 eleme…
-
I tried extracting data from a PDF containing the image below.
![image](https://github.com/user-attachments/assets/7d8b668a-d8b3-4c08-80b5-f77e7f93ad7c)
However, the result was
![image](https://git…
-
Getting the following error
```
TypeError: can't multiply sequence by non-int of type 'float'
```
Full error trace log
```
File "/home/siddhant/Desktop/deployment_extra/chatbot/dobby-be-pre…