alejandro-ao / ask-multiple-pdfs

A Langchain app that allows you to chat with multiple PDFs
1.66k stars 940 forks source link

add metadata in similarity search #18

Open Canada-wet opened 1 year ago

Canada-wet commented 1 year ago

Hi thanks for the brilliant video, just saw a comment asking for metadata like pages for similarity search. I played around it and made a bit changes, and this works for me. Please check if this can help

yuvrajpowar commented 1 year ago

why is purpose of this statment, metadata_input = [metadatas[i]]*len(texts_temp)?

Canada-wet commented 1 year ago

why is purpose of this statment, metadata_input = [metadatas[i]]*len(texts_temp)?

So when we split the texts, we also need to duplicate the corresponding metadata to ensure they still match each other for FAISS vectorDB creation.

e.g.

initial_text = 'I love watching YouTube videos. I am also a YouTuber myself.' initial_metadata = [{'source':'random_blog1', 'page': 6}]

split_text = ['I love watching YouTube videos.', 'I am also a YouTuber myself.'] split_metadata = [{'source':'random_blog1', 'page': 6}, {'source':'random_blog1', 'page': 6}]