BeanieODM / beanie

Asynchronous Python ODM for MongoDB
http://beanie-odm.dev/
Apache License 2.0
2.02k stars 215 forks source link

[BUG] [need help]validate_on_save's behavior #775

Closed hd10180 closed 7 months ago

hd10180 commented 11 months ago

Describe the bug please regret for my bad english.

beanie's behavior maybe not the same as before. i always use model_copy(update=schema.model_dump()) to update an document (it used to be .copy(update=schema.dict())), but i found the data in database not the same as use ODM, eg: PydanticObjectId will be dump to str after use model_dump(), it used to be ObjectId in database if i set validate_on_save=True, it seems change to an str now.

For this changes the data in the database (which I think is more important) that I create an issue instead of discussion. maybe relate to #664 , and i use the Reproduce code on #664

To Reproduce

import asyncio

from beanie import Document, PydanticObjectId, init_beanie
from motor.motor_asyncio import AsyncIOMotorClient
from pydantic import BaseModel, Field

class Base(Document):
    id: PydanticObjectId = Field(default_factory=PydanticObjectId)

class Tag(Base):
    number: int
    related: PydanticObjectId

    class Settings:
        name = "Tag"
        validate_on_save = True

class UpdateSchema(BaseModel):
    id: PydanticObjectId | None = None
    number: int | None = None
    related: PydanticObjectId | None = None

client = AsyncIOMotorClient("mongodb://localhost:27017")
db = client["test_beanie"]

async def init():
    await init_beanie(database=db, document_models=[Tag])

async def start_test():
    await init()
    tag_new = Tag(number=1, related=PydanticObjectId())
    await tag_new.create()
    print(tag_new)

    tag2 = await Tag.find_one(Tag.id == tag_new.id)
    print(f"find after create: {tag2}")

    update = UpdateSchema(related=PydanticObjectId())
    tag2 = tag2.model_copy(update=update.model_dump(exclude_unset=True))
    tag2.number = 999
    print(f"after model_copy: {tag2}")

    print("_______Do Save_________")
    await tag2.save()
    print(f"after save: {tag2}")

    tag3 = await Tag.find_one(Tag.id == tag_new.id)
    print(f"actually in database use beanie find_one: {tag3}")
    raw = await db[Tag.get_settings().name].find_one({"_id": tag_new.id})
    print(f"actually in database use motor find_one: {raw}")

async def main():
    await start_test()

if __name__ == "__main__":
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())

Expected behavior the field must be an ObjectId but now it is str

Additional context it make me confuse because the different type between database and model

hd10180 commented 11 months ago

This is my solution, but I don't know if it lacks necessary considerations.

DocType = TypeVar("DocType", bound="Base")

class Base(Document):
    id: PydanticObjectId = Field(default_factory=PydanticObjectId)

    def model_copy(
        self: DocType, *, update: dict[str, Any] | None = None, deep: bool = False
    ) -> DocType:
        """overwrite model_copy, re-init model"""
        copied = super().model_copy(update=update, deep=deep)
        return self.__class__(**copied.model_dump())
roman-right commented 11 months ago

Hi @hd10180 , Thank you for the catch! It looks critical. I'll check it tomorrow.

roman-right commented 11 months ago

Reproduced. It's an interesting coincidence. The model_copy with an update doesn't validate what comes from the update on the Pydantic side:

update: Values to change/add in the new model. Note: the data isn't validated
                before creating the new model. You should trust this data.

On the Beanie side, validation before save doesn't update the document, only checks if fields are valid. PydanticObjectId should be representable both as str (to be able to receive JSON data) and as a binary object. It accepts str, and it's okay with it.

Thanks for the catch.

You can try the fixed version here: https://github.com/roman-right/beanie/pull/776

I'll add the respective tests tomorrow and then publish it.

bkis commented 10 months ago

I know I'm late as @roman-right already merged a fix, so I actually hope that what I'm doing here is superfluous. Two days ago I had a similar problem and prepared an as-close-to-reality-as-possible MRE for an issue here. Maybe it helps people looking for answers in the future.

This is very similar to the example posted here (and even the test in #776, actually):

import asyncio

from beanie import Document, PydanticObjectId, init_beanie
from motor.motor_asyncio import AsyncIOMotorClient
from pydantic import BaseModel

class Resource(Document):
    shares: list[PydanticObjectId] = []

    class Settings:
        name = "resources"
        validate_on_save = True

class ResourceUpdate(BaseModel):
    """Update model (for CRUD endpoints) with all fields optional"""

    shares: list[PydanticObjectId] | None = None

class User(BaseModel):
    id: PydanticObjectId

async def main():
    # setup DB
    client = AsyncIOMotorClient("mongodb://127.0.0.1:27017")
    await init_beanie(
        database=client.some_db,
        document_models=[Resource],
    )
    client.some_db["something"].drop()

    # create dummy user
    user = User(id="654ba3a6ec7833e469dde77a")

    # create dummy resource
    res = await Resource().create()

    # create update model that would be request model of a PATCH endpoint
    updates = ResourceUpdate(shares=[user.id])
    # update resource with the passed updates - to apply all the updates at once, the
    # model must be dumped at some point if we don't want to loop through its attributes
    # and apply them one by one, so the updated values all get serialized here
    # (PydanticObjectId -> str)
    await Resource.find_one(Resource.id == res.id).set(
        updates.model_dump(exclude_unset=True)
    )

    # at this point, the value for `shares` in the DB is a list of strings :(
    # so now if there's a request asking for all resources that are shared with
    # our user, we have to convert the ID we're querying for to str (not pretty!)

    # find resources shared with user (using `user.id` and not `str(user.id)`!)
    res = await Resource.find_one(Resource.shares == user.id, with_children=True)

    print(res)  # this would print "None" without the fix

if __name__ == "__main__":
    asyncio.run(main())

So yes, I agree, this bug is rather critical as it breaks a very common webapp data handling strategy (or at least makes it very hacky to get it to work). Looking forward to the release of the fix!

hd10180 commented 7 months ago

fixed #776