BeanieODM / beanie

Asynchronous Python ODM for MongoDB
http://beanie-odm.dev/
Apache License 2.0
2.03k stars 215 forks source link

[BUG] Link DBRefs do not preserve extra attributes when calling `document.model_dump()` #1041

Open MrEarle opened 1 week ago

MrEarle commented 1 week ago

Describe the bug When dumping a model with links, the corresponding DBRef dicts don't keep extra attributes.

To Reproduce

class User(Document):
    email: str

class ToDo(Document):
    user: Link[User]

async def main():
    client = AsyncIOMotorClient(db_uri)
    await init_beanie(
        database=client.get_database("test"),
        document_models=[User, ToDo],
    )

    user = await User(email="test@example.com").create()

    user_db_ref = DBRef("user", user.id, _extra={"email": user.email})

    todo = await ToDo(user=user_db_ref).create()

    print(todo.model_dump()["user"])  # Does not print the email field

Expected behavior A clear and concise description of what you expected to happen.

I would expect that the serialization of the DBRef contains the extra fields it contains. This helps in having an extended reference pattern.

Additional context

Playing around with beanie's implementation, I got it working by modifying the to_dict method of Link:

    def to_dict(self):
        extra = {k: v for k,v in self.ref.as_doc().items() if k not in {"$ref", "$id", "$db"}}
        return {**extra, "id": str(self.ref.id), "collection": self.ref.collection}

Edit: This would probably change how to do serialization with Pydantic. For example, in build_validation for Pydantic V2, if isinstance(v, dict) and v.keys() == {"id", "collection"} is used to detect if the dict is a DBRef. Given that keys can now contain more items, the strict equality would no longer hold.

staticxterm commented 1 week ago

I would technically not characterize this as a bug as this was simply never implemented. My guess is that DBRef comes from PyMongo right? https://github.com/mongodb/mongo-python-driver/blob/master/bson/dbref.py The thing is, Link objects are essentially Pydantic models. And when you try to serialize attributes that are private in the model, they are simply omitted Ref: https://docs.pydantic.dev/latest/concepts/models/#private-model-attributes So to support your use case, some 'hacking' would be necessary, as you've demonstrated.

MrEarle commented 1 week ago

Well, Links are not pydantic models, they are classes that can be serialized by pydantic. The serialization very explicitly only encodes the inner self.ref's id and collection, while ignoring any other attribute it has.

Also, when creating a document with a link, you need to either call to_ref or build the DBRef yourself. Maybe this is not a bug, but I don't think it is desirable behavior to omit the fields that the DBRef contains.

MrEarle commented 1 week ago

To further drive de discussion, Beanie showcases Links as a way to filter a document based on the linked document field. In my example, it would be:

Todo.find(Todo.user.email == email, fetch_links=True)

But this performs a lookup before filtering, and cannot take advantages of indices, which requires expensive colscan operations. This get harder for more complex real-world examples.

A way to improve on this, is to have extra fields, like in the embedded document pattern, that can be indexed (IndexModel(user.email)). That way, we can perform the above query with an index and without the need to fetch links. We can then fetch the links after performing the query, which would not involve a colscan just to filter the Todo collection.

But the current implementation does not support this.

staticxterm commented 1 week ago

I see. That may be very useful but would probably complicate the design or overall logic. If I'm understanding this correctly, we can have any additional fields under the $ref field, which then somehow need to stay in sync from wherever they are originally defined, is this the expectation? (I'm thinking about when this embedded "Link" data needs to be updated) If so, this seems like much work.

But I do agree that at least this possibility to add extra fields is currently missing, and your proposed code in Additional context seems like a way to go and probably wouldn't break anything if extra fields are added.

MrEarle commented 1 week ago

Well, it should be the responsibility of the developer to determine when to update the extra fields in the DBRef, as that is part of the business logic of their use cases. The advantages of embedded document pattern are to improve read performance, at the expense of write performance, which is usually rarer. This is a common (and recommended) pattern for MongoDB.

Maybe the solution is not to add this functionality to Links, but have a new type Link-ish feature that represents an embedded document.