Marker-Inc-Korea / RAGchain

Extension of Langchain for RAG. Easy benchmarking, multiple retrievals, reranker, time-aware RAG, and so on...
Apache License 2.0
269 stars 26 forks source link

change insert_one() to insert_many() in mongo_db save function. #455

Closed bwook00 closed 6 months ago

bwook00 commented 6 months ago

close #438

(Plus, if there is duplicate "_id" key in mongo db, it occurs error and stop everything) -> This feature is already well supported by mongodb itself (BulkwriteError)

bwook00 commented 6 months ago

If I use pymongo's update_many() with upsert=True, it would update all existing passages into a "single passage".

So we take the upsert parameter directly, and if true, we put the existing ids in a list and only update the duplicates (and even then, we use bulk_wirte to minimize the iteration as much as possible).

Instead of updating all of them, we just proceed with insert_many() for non-existing ids.

(I've been googling hard for a function to update all at once in pymongo, but I haven't found one...)