Closed sanesanyo closed 1 year ago
this would probably be a plugin to use a python pdf parsing library analogous to pdf2text (not sure how to mark/label the issue or if I am lacking permissions to do so)
Agreed with @Boostrix on this one. PDF parsing is an extraneous task, and isn't as straightforward as it ought to be. It would be better to assign that to developers who are skilled in PDF parsing.
There already is PR #3031 which supports plain text based PDF processing.
that would also provide the option to support arguments, such as searching a PDF file based on authors, date, pages etc (which would return a list of pages/matches etc)
a higher level command would probably be an adaption of browse_website or to search specifically just for PDF files using different search engines/APIs (think research servers as per #826), as per: https://github.com/Significant-Gravitas/Auto-GPT/issues/503#issuecomment-1534094916
Probably covered by #2730
Plugin candidate, once the dust settles with #3652
This issue was closed automatically because it has been stale for 10 days with no activity.
Duplicates
Summary 💡
When crawling the web to do market research, a lot of links are sometimes just pdf documents. It would be great if Auto GPT had an inherent ability to parse those pdfs & feed the text for GPT4 to analyse.
Examples 🌈
Motivation 🔦
This way Auto GPT can do the market research task far better than it currently can.