AutoSurveys / AutoSurvey

251 stars 18 forks source link

How to build persoanl database and how these .bin files are constructed? #16

Open Lily653 opened 1 month ago

Lily653 commented 1 month ago

Hi,

Thank you for your amazing work on this project! I have a couple of questions regarding database construction and how the paper content is processed for AutoSurvey.

  1. Building My Own Database I am currently trying to build my own database and would like to know how the faiss_pper_abs-embeddings.bin file is generated. I knew that I should use nomic-embed-text-v1 model, but I have no idea how to continue. Could you provide guidance or details on how these .bin files are constructed? Any help would be greatly appreciated.

  2. Processing the First 1,500 Tokens In the paper, you mention that the model focuses on "the main body of each paper (up to the first 1,500 tokens)." Could you clarify if this process includes the paper's abstract, followed by truncating the first 1,500 tokens of the paper, and then feeding this as the paper_content portion to the model's write function? I'd love to understand the exact implementation.

Thanks again for your help!