Open ayoubelmhamdi opened 1 year ago
AYOUB, thanks for taking a look at my code! Can you provide the name of the file you're referring to - I'll go take a look
here you give top_k=5
https://github.com/beverm2391/ai-summarizer/blob/600dd69cbd6c3fb83a2397d23fdc610a2efb96ed/package/mongodoc.py#L24
Eventually, a prompt consisting of a question and five chunks will be sent to ChatGPT. The recommended approach for summarizing currently involves using the longchain method, which entails dividing a PDF document into the maximum number of tokens feasible and generating a summary from each section. This results in a comprehensive summary of all parts.
I hope this message finds you well. I wanted to bring to your attention a potential issue I noticed while exploring your repository. Specifically,
I noticed that you are using
embedding
to getvectors
of chunks of each part of the pdf and then embedding the question to get thetop_k=3
of the score of the dot product between the question and vectors.This is an interesting approach, but I noticed you will get only the chunks that talk about summarizing in the PDF, so will returns prompt + 3 chunks from the PDF. However, I noticed that this leads to sending
garbage
to chatGPT.I just wanted to bring this to your attention in case it was unintentional or if there is a better way to approach this. Thank you for your time and for sharing your work on this repository.
Best regards,