How to build persoanl database and how these .bin files are constructed？

Hi,

Thank you for your amazing work on this project! I have a couple of questions regarding database construction and how the paper content is processed for AutoSurvey.

Building My Own Database I am currently trying to build my own database and would like to know how the faiss_pper_abs-embeddings.bin file is generated. I knew that I should use nomic-embed-text-v1 model, but I have no idea how to continue. Could you provide guidance or details on how these .bin files are constructed? Any help would be greatly appreciated.
Processing the First 1,500 Tokens In the paper, you mention that the model focuses on "the main body of each paper (up to the first 1,500 tokens)." Could you clarify if this process includes the paper's abstract, followed by truncating the first 1,500 tokens of the paper, and then feeding this as the paper_content portion to the model's write function? I'd love to understand the exact implementation.

Thanks again for your help!

AutoSurveys / AutoSurvey

How to build persoanl database and how these .bin files are constructed？ #16