Closed bwook00 closed 6 months ago
If I use pymongo's update_many() with upsert=True, it would update all existing passages into a "single passage".
So we take the upsert parameter directly, and if true, we put the existing ids in a list and only update the duplicates (and even then, we use bulk_wirte to minimize the iteration as much as possible).
Instead of updating all of them, we just proceed with insert_many() for non-existing ids.
(I've been googling hard for a function to update all at once in pymongo, but I haven't found one...)
close #438
(Plus, if there is duplicate "_id" key in mongo db, it occurs error and stop everything) -> This feature is already well supported by mongodb itself (BulkwriteError)