bhaskatripathi / pdfGPT

PDF GPT allows you to chat with the contents of your PDF file by using GPT capabilities. The most effective open source solution to turn your pdf files in a chatbot!
https://huggingface.co/spaces/bhaskartripathi/pdfGPT_Turbo
MIT License
6.96k stars 838 forks source link

Anyway to use a repository of PDF files on disk? #116

Open dsputnikk opened 6 months ago

dsputnikk commented 6 months ago

I want to upload hundreds of PDFs and use them as the dataset.

jasonxbliu commented 6 months ago

I want to upload hundreds of PDFs and use them as the dataset.

The more documents there are, the lower the information detection rate will be. I'd like to know, why do you need to upload so many documents? Are you conducting academic research?

dsputnikk commented 6 months ago

A repository of large automotive service manuals

On Mon, Apr 29, 2024, 10:23 AM jasonxbliu @.***> wrote:

I want to upload hundreds of PDFs and use them as the dataset.

The more documents there are, the lower the information detection rate will be. I'd like to know, why do you need to upload so many documents? Are you conducting academic research?

— Reply to this email directly, view it on GitHub https://github.com/bhaskatripathi/pdfGPT/issues/116#issuecomment-2082248816, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADHRFUJA2YMVRCX4GYNL5MDY7YGRFAVCNFSM6AAAAABG5IBYMSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBSGI2DQOBRGY . You are receiving this because you authored the thread.Message ID: @.***>

jasonxbliu commented 6 months ago

A repository of large automotive service manuals On Mon, Apr 29, 2024, 10:23 AM jasonxbliu @.> wrote: I want to upload hundreds of PDFs and use them as the dataset. The more documents there are, the lower the information detection rate will be. I'd like to know, why do you need to upload so many documents? Are you conducting academic research? — Reply to this email directly, view it on GitHub <#116 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADHRFUJA2YMVRCX4GYNL5MDY7YGRFAVCNFSM6AAAAABG5IBYMSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBSGI2DQOBRGY . You are receiving this because you authored the thread.Message ID: @.>

Got it. May I know how many PDFs in the repository and how many pages of per PDFs in average? We have made an iOS application with workspaces inspired by pdfGPT. Maybe we can help you to solve this problem. Thanks for feedback.

dsputnikk commented 6 months ago

Right now, probably about a hundred. One of my technical challenges is to use a vision capable model for the ones that are just a scan and not a "real" pdf.

On Mon, Apr 29, 2024, 11:13 AM jasonxbliu @.***> wrote:

A repository of large automotive service manuals … <#m3018232275242590760> On Mon, Apr 29, 2024, 10:23 AM jasonxbliu @.> wrote: I want to upload hundreds of PDFs and use them as the dataset. The more documents there are, the lower the information detection rate will be. I'd like to know, why do you need to upload so many documents? Are you conducting academic research? — Reply to this email directly, view it on GitHub <#116 (comment) https://github.com/bhaskatripathi/pdfGPT/issues/116#issuecomment-2082248816>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADHRFUJA2YMVRCX4GYNL5MDY7YGRFAVCNFSM6AAAAABG5IBYMSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBSGI2DQOBRGY https://github.com/notifications/unsubscribe-auth/ADHRFUJA2YMVRCX4GYNL5MDY7YGRFAVCNFSM6AAAAABG5IBYMSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBSGI2DQOBRGY . You are receiving this because you authored the thread.Message ID: @.>

Got it. May I know how many PDFs in the repository and how many pages of per PDFs in average? We have made an iOS application with workspaces inspired by pdfGPT. Maybe we can help you to solve this problem. Thanks for feedback.

— Reply to this email directly, view it on GitHub https://github.com/bhaskatripathi/pdfGPT/issues/116#issuecomment-2082349995, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADHRFULCNPJ7DP2LJLIBA43Y7YMMRAVCNFSM6AAAAABG5IBYMSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBSGM2DSOJZGU . You are receiving this because you authored the thread.Message ID: @.***>