dapper91 / pydantic-xml

python xml for humans
https://pydantic-xml.readthedocs.io
The Unlicense
155 stars 16 forks source link

Serialization of union of similar models #72

Closed GuillaumeDesforges closed 1 year ago

GuillaumeDesforges commented 1 year ago

If a root model has a field of type the union of two models which are similar, the serializer will pick the first one in the union.

I've implemented this test https://github.com/GuillaumeDesforges/pydantic-xml/commit/f36068168f890bb50efdeaa3565f1659bd00450d

Which raises

_________________________________________________________________ test_model_union_similar _________________________________________________________________

    def test_model_union_similar():
        class SubModel1(BaseXmlModel, tag='model1'):
            text: str = element()

        class SubModel2(BaseXmlModel, tag='model2'):
            text: str = element()

        class TestModel(BaseXmlModel, tag='model'):
            field1: Union[SubModel1, SubModel2] = element()

        xml = '''
        <model><model1><text>foo</text></model1></model>
        '''

        actual_obj = TestModel.from_xml(xml)
        expected_obj = TestModel(
            field1=SubModel1(text="foo"),
        )

        assert actual_obj == expected_obj

        actual_xml = actual_obj.to_xml()
        assert_xml_equal(actual_xml, xml)

        xml = '''
        <model><model2><text>foo</text></model2></model>
        '''

        actual_obj = TestModel.from_xml(xml)
        expected_obj = TestModel(
            field1=SubModel2(text="foo"),
        )

        assert actual_obj == expected_obj

        actual_xml = actual_obj.to_xml()
>       assert_xml_equal(actual_xml, xml)

tests/test_unions.py:108: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

left = '<model>\n  <model1>\n    <text>foo</text>\n  </model1>\n</model>\n', right = '<model>\n  <model2>\n    <text>foo</text>\n  </model2>\n</model>\n'
ignore_comments = True, pretty = True, kwargs = {}, diffs = [RenameNode(node='/model/model1[1]', tag='model2')]
parser = <lxml.etree.XMLParser object at 0x7fa3e6f25a20>

    def assert_xml_equal(
            left: Union[str, bytes],
            right: Union[str, bytes],
            *,
            ignore_comments: bool = True,
            pretty: bool = True,
            **kwargs,
    ):
        diffs = xmldiff.main.diff_texts(left, right, **kwargs)

        if ignore_comments:
            diffs = list(filter(lambda diff: not isinstance(diff, xmldiff.actions.InsertComment), diffs))

        if diffs:
            if pretty:
                parser = etree.XMLParser(remove_blank_text=True, remove_comments=ignore_comments)
                left = etree.tostring(etree.fromstring(left, parser=parser), pretty_print=True).decode()
                right = etree.tostring(etree.fromstring(right, parser=parser), pretty_print=True).decode()
>               assert not diffs, '\n' + '\n'.join(difflib.Differ().compare(left.splitlines(), right.splitlines()))
E               AssertionError: 
E                 <model>
E               -   <model1>
E               ?         ^
E               
E               +   <model2>
E               ?         ^
E               
E                     <text>foo</text>
E               -   </model1>
E               ?          ^
E               
E               +   </model2>
E               ?          ^
E               
E                 </model>

tests/helpers.py:30: AssertionError
dapper91 commented 1 year ago

Hmm...

I'm not sure that is the pydantic-xml bug.

The following pure pydantic code behave the same way:

>>> from typing import Union
>>> from pydantic import BaseModel
>>> 
>>> 
>>> class SubModel1(BaseModel):
...     text: str
... 
>>> 
>>> class SubModel2(BaseModel):
...     text: str
... 
>>> 
>>> class TestModel(BaseModel):
...     field1: Union[SubModel1, SubModel2]
... 
>>> 
>>> obj = TestModel.parse_obj({'field1': SubModel2(text='foo')})
>>> print(obj)
field1=SubModel1(text='foo')

I think the problem is that parse_obj transforms the object to a dict before parsing. Since SubModel1 and SubModel2 have the same signature, union choose the first model matched which is SubModel1.

GuillaumeDesforges commented 1 year ago

Thanks for the reply.

My issue is not about the deserialization but the serialization.

The error in my test does not fail when parsing XML My test fails when trying to serialize

        actual_xml = actual_obj.to_xml()
>       assert_xml_equal(actual_xml, xml)

Does to_xml rely on parse_obj?

dapper91 commented 1 year ago

The error in my test does not fail when parsing XML. My test fails when trying to serialize

Yes, I understand that

Does to_xml rely on parse_obj?

Yes, in context of Union type the decision which deserializer to use is based on the actual field type, which in turn derived using pydantic parse_obj. So if pydantic choose a wrong field type on serialization, a wrong deserializer is chosen on deserialization.

dapper91 commented 1 year ago

@GuillaumeDesforges found the solution.

Switch on smart_union flag for your model:

class TestModel(BaseXmlModel, tag='model'):
    class Config:
        smart_union = True

    field1: Union[SubModel1, SubModel2] = element()
GuillaumeDesforges commented 1 year ago

Thanks a lot, that's very helpful!