Open dsputnikk opened 6 months ago
I want to upload hundreds of PDFs and use them as the dataset.
The more documents there are, the lower the information detection rate will be. I'd like to know, why do you need to upload so many documents? Are you conducting academic research?
A repository of large automotive service manuals
On Mon, Apr 29, 2024, 10:23 AM jasonxbliu @.***> wrote:
I want to upload hundreds of PDFs and use them as the dataset.
The more documents there are, the lower the information detection rate will be. I'd like to know, why do you need to upload so many documents? Are you conducting academic research?
— Reply to this email directly, view it on GitHub https://github.com/bhaskatripathi/pdfGPT/issues/116#issuecomment-2082248816, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADHRFUJA2YMVRCX4GYNL5MDY7YGRFAVCNFSM6AAAAABG5IBYMSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBSGI2DQOBRGY . You are receiving this because you authored the thread.Message ID: @.***>
A repository of large automotive service manuals … On Mon, Apr 29, 2024, 10:23 AM jasonxbliu @.> wrote: I want to upload hundreds of PDFs and use them as the dataset. The more documents there are, the lower the information detection rate will be. I'd like to know, why do you need to upload so many documents? Are you conducting academic research? — Reply to this email directly, view it on GitHub <#116 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADHRFUJA2YMVRCX4GYNL5MDY7YGRFAVCNFSM6AAAAABG5IBYMSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBSGI2DQOBRGY . You are receiving this because you authored the thread.Message ID: @.>
Got it. May I know how many PDFs in the repository and how many pages of per PDFs in average? We have made an iOS application with workspaces inspired by pdfGPT. Maybe we can help you to solve this problem. Thanks for feedback.
Right now, probably about a hundred. One of my technical challenges is to use a vision capable model for the ones that are just a scan and not a "real" pdf.
On Mon, Apr 29, 2024, 11:13 AM jasonxbliu @.***> wrote:
A repository of large automotive service manuals … <#m3018232275242590760> On Mon, Apr 29, 2024, 10:23 AM jasonxbliu @.> wrote: I want to upload hundreds of PDFs and use them as the dataset. The more documents there are, the lower the information detection rate will be. I'd like to know, why do you need to upload so many documents? Are you conducting academic research? — Reply to this email directly, view it on GitHub <#116 (comment) https://github.com/bhaskatripathi/pdfGPT/issues/116#issuecomment-2082248816>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADHRFUJA2YMVRCX4GYNL5MDY7YGRFAVCNFSM6AAAAABG5IBYMSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBSGI2DQOBRGY https://github.com/notifications/unsubscribe-auth/ADHRFUJA2YMVRCX4GYNL5MDY7YGRFAVCNFSM6AAAAABG5IBYMSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBSGI2DQOBRGY . You are receiving this because you authored the thread.Message ID: @.>
Got it. May I know how many PDFs in the repository and how many pages of per PDFs in average? We have made an iOS application with workspaces inspired by pdfGPT. Maybe we can help you to solve this problem. Thanks for feedback.
— Reply to this email directly, view it on GitHub https://github.com/bhaskatripathi/pdfGPT/issues/116#issuecomment-2082349995, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADHRFULCNPJ7DP2LJLIBA43Y7YMMRAVCNFSM6AAAAABG5IBYMSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBSGM2DSOJZGU . You are receiving this because you authored the thread.Message ID: @.***>
I want to upload hundreds of PDFs and use them as the dataset.