Filimoa / open-parse

Improved file parsing for LLM’s
https://filimoa.github.io/open-parse/
MIT License
2.34k stars 89 forks source link

Is the purpose of this project to interpret and comprehensively analyze the content of PDF documents? #42

Closed Bruce337f closed 4 months ago

Bruce337f commented 4 months ago

Description

When I installed and ran the code according to the example, I easily obtained the text content existing on the pdf. This is a very convenient project!

But what puzzles me is that the developer also provided sample code for openai. Does this mean that openai can be provided to generate summary conclusions for PDF content, or analyze the theme of the content?

Bruce337f commented 4 months ago

I would like to ask if the following projects can edit PDF in detail and specific document content, and what is their relationship with this project? Dealing with PDF's:

pdfminer.six Fully open source. Extracting Tables:

PyMuPDF has some table detection functionality. Please see their license. Table Transformer is a deep learning approach. unitable is another transformers based approach with state-of-the-art performance.

This is a good project!