BeanieODM / beanie

Asynchronous Python ODM for MongoDB
http://beanie-odm.dev/
Apache License 2.0
2.09k stars 219 forks source link

Serialization Issues with PydanticObjectId and Link in Beanie 1.27.0 [BUG] #1043

Open martincpt opened 1 month ago

martincpt commented 1 month ago

Describe the bug

Since upgrading to Beanie 1.27.0, we've encountered several serialization problems with PydanticObjectId and Link objects.

It seems that PydanticObjectId no longer automatically casts itself to a string primitive during Pydantic serialization.

To Reproduce

Example 1:

class MyDoc(Document):
    related_id: str

MyDoc(related_id=PydanticObjectId())

Ends up with pydantic_core._pydantic_core.ValidationError.

Example 2:

# When testing with FastAPI's TestClient - converting dict to JSON
client.post(endpoint, json={"related_id": PydanticObjectId()})

Ends up with TypeError: Object of type PydanticObjectId is not JSON serializable.

The above snippets worked fine in version 1.26.0 but now fails in 1.27.0.

Expected behavior The expected behavior is that PydanticObjectId should automatically cast itself to a string during Pydantic serialization, as it did in version 1.26.0.

Additional context We traced the issue to the introduction of a new when_used="json" attribute in the __get_pydantic_core_schema__ methods, which seems to be the core cause of the problem.

We acknowledge that our type casting may not have been very strict, but the previous behavior was convenient and worked seamlessly. Without this attribute, all serialization appears to function as expected.

Questions:

  1. What was the rationale behind introducing when_used="json"?
  2. Was this change—specifically the loss of automatic string serialization—intentional, or can we expect a fix in future releases?

Thank you for your hard work and for maintaining such an awesome package!

danielxpander commented 1 month ago

Same for me. Rollbacked to 1.26.0

07pepa commented 1 month ago

hi @danielxpander @martincpt

  1. rationale begind when_used="json" is it should convert only in json serialization cases otherwise it should conserve its native type for naive types. PydanticObjectId is de facto naive type for mongo
  2. imho No , Mongo can save PydanticObjectId in non _id field i consider this an anipatern (since you should store stuff as close as original of its form as possible. I strogly recomending to run pymongo or compas and fix type of related_id in your database (it can bring speed improvement since it is stored natively)

BTW secondly this makes valid MyDoc(related_id="🔥") i am guessing this will not work and by looks of things i am guessing this is referencing another table and that wont work

lets recapitulate Imho

class MyDoc(Document):
    related_id: str

MyDoc(related_id=PydanticObjectId())

Should expected to fail in same way putting arbitrary object into another object that does not inherit 1st without conversion. It was bug from start.

also Here is workaround i am proposing for you code if you do not want to convert your database should work as well

class MyDoc(Document):
    related_id: PydanticObjectId

MyDoc(related_id=PydanticObjectId())

this is probably what you wanted in 1st place but i would double check and convert data

07pepa commented 1 month ago

@danielxpander @martincpt please share your thoughts... but there is definite rationale and i consider this was serialization bug and carry over from pydantic v1

bkis commented 1 month ago

Rationale or not, this is still a breaking change in a minor release.

While I can see how it's debatable whether type(model.model_dump()["id"]) should be <class 'beanie.odm.fields.PydanticObjectId'> or str, it is a breaking change that is causing problems.

I have a really big codebase relying on PydanticObjectId to be "python-serialized" as str and this is now a problem as this change is not backwards compatible.

martincpt commented 1 month ago

@07pepa

In this specific case, we have a logging protocol where we chose to store the value as a string because it's not always a PydanticObjectId. Of course, in other cases, we prefer to store data in its original form, specifically using a Link type (a.k.a DBRef).

I understand your perspective, but I feel it might be a bit strict. I don't think the previous, more lenient serialization was a significant issue. As long as it's not introducing a security risk, I would be in favor of keeping it as is—or at least giving users the option to choose whether the package should allow serialization from a string or not.

marwan-alloreview commented 3 weeks ago

Just want to make sure that the 1.27.0 behaviour will not change ?

I agree that this was a breaking change. But in my opinion, when using model_dump with mode "python" it should not serialize ObjectIds to string. So the new behaviour makes more sense (at least to me).

I am considering relying on the 1.27.0 behaviour but I want to make sure it won't be changed back to the 1.26.0 behaviour in future versions ?

martincpt commented 3 weeks ago

I have to admit, I hadn’t noticed that model_dump was converting ObjectId to a string, which is indeed not ideal—I’d also prefer the original format.

However, this raises a question for me: could there be a solution where model_dump retains the ObjectId, but in the opposite case—if the string format is valid—it could convert a string back to an ObjectId?

I believe this wouldn’t necessarily conflict with fundamental principles. What do you think, @07pepa?