BeanieODM / beanie

Asynchronous Python ODM for MongoDB
http://beanie-odm.dev/
Apache License 2.0
2.02k stars 214 forks source link

How to bulk update with upsert=True? #224

Closed PATAPOsha closed 1 year ago

PATAPOsha commented 2 years ago

Using beanie==1.10.1 Trying to preform bulk update with upsert=True with no luck.

async def run_test():
    await init_mongo()

    docs = [TestDoc(a=f"id_{i}", b=random.randint(1, 100)) for i in range(10)]
    # docs = await insert_test_docs()
    async with BulkWriter() as bulk_writer:
        for doc in docs:
            await TestDoc \
                .find_one({TestDoc.a: doc.a}, bulk_writer=bulk_writer) \
                .upsert(Set({TestDoc.b: random.randint(1, 100)}), on_insert=doc, bulk_writer=bulk_writer)
                # also tried with `.update_one()`
                # .update_one(Set({TestDoc.b: random.randint(1, 100)}), bulk_writer=bulk_writer, upsert=True)

also tried .find_one() + .update_one(). Result is the same. Documents being updated only if they were inserted previously. But if collection is empty - no documents are inserted after exuction. So, upsert doesn't work. How to preform bulk update with upsert=True correctly?

With pymongo code for the same logic will be following:

def write_reviews(self, docs: List[TestDoc]):
  operations = []
  for doc in docs:
      doc_dict = to_dict(doc)
      update_operation = pymongo.UpdateOne(
          {"a": doc.a}, {"$set": doc_dict}, upsert=True
      )
      operations.append(update_operation)

  result = self.test_collection.bulk_write(operations)
roman-right commented 2 years ago

Hm, it could be a bug. Let me check

roman-right commented 2 years ago

Ah, the thing is upsert was not supported by the bulk writer at all, but the interface allowed to pass the parameter (as it has **kwargs there - so no error was raised) and for update case it worked. This is a nice catch! I'll fix it by the weekend if this is not super urgent for you. For now, you can do stuff without bulk writing.

PATAPOsha commented 2 years ago

Great! Thanks!

roman-right commented 2 years ago

I had a problem with the doc generation yesterday, so now it should work in 1.10.2. Please check :)

PATAPOsha commented 2 years ago

@roman-right I've checked version 1.10.3. And things do not seem to work properly for me. In your example here https://github.com/roman-right/beanie/blob/main/tests/odm/documents/test_bulk_write.py#L134 This code actually inserts single document one by one. Instead it should "stack" operations inside bulk_writer and execute query to mongo only after await bulk_writer.commit(). At least that's how I understand bulk writes and what I was asking for.

Right how it works here https://github.com/roman-right/beanie/blob/main/tests/odm/documents/test_bulk_write.py#L31 It executes query to mongo only after it quits from with context manager. Considering this maybe it would be easier to pass upsert=True parameter to find_one().update() construction?

roman-right commented 2 years ago

So, yes. My tests were incorrect, sorry. I'll roll back this soon.

I dug a little deeper into this problem again and the thing is - MongoDB has some weird limitations. It supports upserts, but it is impossible to use for example $set or $inc operator in the update query together with $setOnInsert for the same fields. Beanie is about structured documents and can not allow upserting only part of the document and needs to pass the whole document if nothing was found. I can remove fields, used in other update operations from the inserting document, but this behavior is unexpected, as the user set the inserting document explicitly and expects that it will be passed as it is. This is why it uses 2-steps upserting and needs to know, that nothing was updated before inserting - it is impossible to use BulkWriter this way.

I'll try to find a workaround for this, but probably it will not be supported at all.

PATAPOsha commented 2 years ago

Maybe some check of document "fullness" in upsert mode? Like validation of all fields of model before query to mongo.

PATAPOsha commented 2 years ago

Also, this looks more suitable for this case https://github.com/roman-right/beanie/issues/231. With returnDocument parameter it would always retrun full model.

roman-right commented 2 years ago

Yes, I plan to try this way. I can not promise that this will work, as probably there are limitations there again. I'll update you here about the results. The features themselves independently are worse to implement for sure.

ghost commented 2 years ago

@roman-right Did you have a chance to review the issue?

mg3146 commented 1 year ago

Hi - I am having the same issue. I am trying to bulk-update with upsert.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 30 days with no activity.

github-actions[bot] commented 1 year ago

This issue was closed because it has been stalled for 14 days with no activity.

leng-yue commented 1 year ago

Also, this looks more suitable for this case #231. With returnDocument parameter it would always retrun full model.

Did u figure out how to return document?