Closed skewty closed 8 months ago
@skewty Hi
Your model is incorrect. Sub-model is bound to the entire sub-element. More information here.
In your example the first FooPersonData
is bound to the firstsenderdata
sub-element. The next FooPersonData
's in the tuple are bound to the following senderdata
's.
In other words your model is defined for the following document:
<request>
<senderdata>
<address>400</address>
<name></name>
</senderdata>
<senderdata>
<address>401</address>
<name>Bob Andersen</name>
<senderdata>
<address>402</address>
<name>Hansi</name>
</senderdata>
...
</request>
Thanks for the detailed response.
Is it possible to use python-xml
to get it to work with the data above?
I looked and didn't see an equivalent to pydantic's RootModel
in python-xml
which in my mind would be something like: __root__: tuple[FooPersonData, ...]
.
class FooPersonData(BaseXmlModel, tag="persondata", search_mode="unordered"):
address: tuple[str, ...] | None = element(tag="address", default=None)
name: tuple[str, ...] | None = element(tag="name", default=None)
class FooRequest(BaseXmlModel, tag="request", search_mode="unordered"):
sender_data: FooPersonData | None = element(tag="senderdata", default=None)
Is the best I can come up with and then "zip" them together myself.
The model.to_xml()
output is obviously different in this case.
request = FooRequest.from_xml("""<request>
<senderdata>
<address>400</address>
<name></name>
<address>401</address>
<name>Bob Andersen</name>
<address>402</address>
<name>Hansi</name>
<address>403</address>
<name>George Lucas</name>
<address>404</address>
<name>Michael Jensen</name>
<address>406</address>
<name>406</name>
<address>407</address>
<name>Fenger</name>
<address>408</address>
<name>408</name>
<address>410</address>
<name>410</name>
</senderdata>
</request>
""")
for name, address in zip(request.sender_data.name, request.sender_data.address):
print(f"name={name!r} address={address!r}")
Gives:
name='Michael Jensen' address='400'
name='Hansi' address='401'
name='406' address='402'
name='Bob Andersen' address='403'
name='Fenger' address='404'
name='George Lucas' address='406'
name='408' address='407'
which is wrong since I need empty name to match with address 400. This throws all the rest off.
Came up with that:
from pydantic_xml import BaseXmlModel, element, wrapped
class Address(BaseXmlModel, tag='address'):
value: str
class Name(BaseXmlModel, tag='name'):
value: str = ""
class FooPersonData(BaseXmlModel):
address: str = element(tag="address", default="")
name: str = element(tag="name", default="")
class FooRequest(BaseXmlModel, tag='request'):
sender_data_raw: tuple[Address | Name, ...] = wrapped(path="senderdata", default=None)
@property
def sender_data(self) -> tuple[FooPersonData, ...] | None:
if self.sender_data_raw is not None:
return tuple((
FooPersonData(address=address.value, name=name.value)
for address, name in zip(self.sender_data_raw[0::2], self.sender_data_raw[1::2])
))
That's awesome! How can I buy you a coffee / tea or something for your effort?
Coming back to this so I can use pydantic v2 and pydantic-xml in production.
I think there is an issue in serialization in pydantic-xml because:
class PersonDataDef(BaseXmlModel, populate_by_name=True, skip_empty=False):
address: Annotated[str | None, element(tag="address", default=None)]
name: Annotated[str | None, element(tag="name", default=None)]
location: Annotated[str | None, element(tag="location", default=None)]
status: Annotated[int | None, element(tag="status", default=None)]
status_info: Annotated[str | None, element(tag="statusinfo", default=None)]
@model_validator(mode="before")
@classmethod
def _watch_out_for_nones(cls, values: dict) -> dict:
return values # the dict here doesn't contain keys+value pairs for status nor statusinfo
but when I go to serialize using output_xml = request.to_xml(skip_empty=True)
I am seeing:
<senderdata><address>991</address><name>991</name><location>SME VoIP</location><status>None</status><statusinfo>None</statusinfo></senderdata>
in the output. This <status>None</status><statusinfo>None</statusinfo>
is incorrect / invalid.
I was trying for a cleaner solution that what you have above. Your solution above didn't serialize correctly in production so I needed to solve that too.
This is the approach I was working with before I stopped / got stuck:
class SenderDataDef(RootXmlModel, tag="senderdata"):
root: tuple[PersonDataDef, ...]
@classmethod
def __build_serializer__(cls) -> None:
super().__build_serializer__()
patched_deserialize = partial(cls._deserialize, cls.__xml_serializer__)
setattr(cls.__xml_serializer__, "deserialize", patched_deserialize)
@classmethod
def _deserialize(cls, self: ModelSerializer, element: XmlElementReader | None, *, context: dict[str, Any] | None) -> BaseXmlModel | None:
# actual_xml = """<senderdata>
# <address>400</address>
# <name></name>
# <address>401</address>
# <name>Bob Andersen</name>
# <address>402</address>
# <name>Hansi</name>
# <address>403</address>
# <name>George Lucas</name>
# <address>404</address>
# <name>Michael Jensen</name>
# <address>406</address>
# <name>406</name>
# <address>407</address>
# <name>Fenger</name>
# <address>408</address>
# <name>408</name>
# <address>410</address>
# <name>410</name>
# </senderdata>"""
items = [
{"address": "400", "name": ""},
{"address": "401", "name": "Bob Andersen"},
{"address": "402", "name": "Hansi"},
{"address": "403", "name": "George Lucas"},
{"address": "404", "name": "Michael Jensen"},
{"address": "406", "name": "406"},
{"address": "407", "name": "Fenger"},
{"address": "408", "name": "408"},
{"address": "410", "name": "410"},
] # is it actually possible to get this information out of element? I tried and couldn't figure it out
result = tuple(PersonDataDef.model_validate(item) for item in items)
return self._model.model_validate(result, strict=False, context=context)
but as you can see in the comment, I wasn't able to extract enough data from element. I was expecting to get the raw ElementTree object here so I could get all the children but couldn't figure it out.
If I got deserialize working, I was going to use a similar approach to serialize.
@skewty Hi
Answering the first question, PersonDataDef
model override skip_empty
flag passed to to_xml
. So remove it from the model
class PersonDataDef(BaseXmlModel, populate_by_name=True):
address: Annotated[str | None, element(tag="address", default=None)]
name: Annotated[str | None, element(tag="name", default=None)]
location: Annotated[str | None, element(tag="location", default=None)]
status: Annotated[int | None, element(tag="status", default=None)]
status_info: Annotated[str | None, element(tag="statusinfo", default=None)]
or define your own None
serialization format.
I was expecting to get the raw ElementTree object here so I could get all the children but couldn't figure it out.
You can get the raw ElementTree
from the XmlElementReader
and iterate over all sub-elements:
element_iter = element.to_native()
items = [
{
'address': addr.text,
'name': name.text,
}
for addr, name in zip(element_iter[0::2], element_iter[1::2])
]
Not sure I how I didn't figure that out when I tried similar myself. Anyway, your very helpful assistance quickly lead to this solution:
class PersonDataDef(BaseXmlModel, populate_by_name=True, skip_empty=False):
# # address field may have type= attribute with one of: IPEI, ALARM, BEACON, CONFIG
address: Annotated[str | None, element(tag="address", default=None)]
name: Annotated[str | None, element(tag="name", default=None)]
location: Annotated[str | None, element(tag="location", default=None)]
status: Annotated[int | None, element(tag="status", default=None)]
status_info: Annotated[str | None, element(tag="statusinfo", default=None)]
class SenderDataDef(RootXmlModel, tag="senderdata"):
root: tuple[PersonDataDef, ...]
@classmethod
def __build_serializer__(cls) -> None:
super().__build_serializer__()
patched_deserialize = partial(cls._deserialize, cls.__xml_serializer__)
setattr(cls.__xml_serializer__, "deserialize", patched_deserialize)
patched_serialize = partial(cls._serialize, cls.__xml_serializer__)
setattr(cls.__xml_serializer__, "serialize", patched_serialize)
@classmethod
def _deserialize(
cls, self: ModelSerializer, element_: XmlElementReader | None, *, context: dict[str, Any] | None
) -> BaseXmlModel | None:
items = []
item = {}
for child_element in element_:
if child_element.tag in item:
items.append(item)
item = {}
item[child_element.tag] = "" if child_element.text is None else child_element.text
if len(item) > 0:
items.append(item)
result = tuple(PersonDataDef.model_validate(item) for item in items)
return self._model.model_validate(result, strict=False, context=context)
@classmethod
def _serialize(
cls,
self: ModelSerializer,
element_: "XmlElementWriter",
value: BaseXmlModel,
encoded: Dict[str, Any],
*,
skip_empty: bool = False,
) -> XmlElementWriter | None:
for item in encoded: # type: dict
for tag, text in item.items():
if text is not None:
e = element_.make_element(tag, None)
e.set_text(text)
element_.append_element(e)
return element_
def __len__(self):
return self.root.__len__()
def __getitem__(self, item):
return self.root.__getitem__(item)
Now that I have that model working it is failing on many others (simple models that should work).
There seems to be something wrong with upstream deserialization. I believe the issue comes from
xml.etree.ElementTree.Element
; empty string is becoming null mistakenly.
Regardless, it should be caught. None from should be converted to ''.
Look at this line of code:
item[child_element.tag] = "" if child_element.text is None else child_element.text
Please observe how python_xml
isn't compatible with itself in the following example:
from typing import Annotated
from pydantic_xml import BaseXmlModel, element
class PersonDataDef(BaseXmlModel, populate_by_name=True, skip_empty=False, tag="persondata"):
# # address field may have type= attribute with one of: IPEI, ALARM, BEACON, CONFIG
address: Annotated[str | None, element(tag="address", default=None)]
name: Annotated[str | None, element(tag="name", default=None)]
location: Annotated[str | None, element(tag="location", default=None)]
status: Annotated[int | None, element(tag="status", default=None)]
status_info: Annotated[str | None, element(tag="statusinfo", default=None)]
input_xml = "<persondata><address>991</address><status>0</status><statusinfo/></persondata>"
input_model = PersonDataDef.from_xml(input_xml)
output_xml = input_model.to_xml(skip_empty=True)
output_model = PersonDataDef.from_xml(output_xml)
assert input_model == output_model
Would you like me to create a new issue or shall we change the title of this issue to something like: "empty string becoming None during deserialization" and re-purpose this issue as it hasleave all the texteverything in
Please observe how python_xml isn't compatible with itself in the following example ...
Empty texts indeed become None
during deserialization, and that is how the underlying deserialization libraries work (xml.etree, lxml) which seems reasonable since the text values are not actually provided.
The problem during serialization is that xml doesn't support None
types natively as json for example, so it is not obvious how to serialize them. There are many ways to do that: 'None', 'none', 'nil', '', 'xsi:nil', etc, I have seen all of them. So right now it is up to the developer to solve that. If you wish to alter the default serialization format you can define your own type like this:
InnerType = TypeVar('InnerType')
XmlOptional = Annotated[
Optional[InnerType],
PlainSerializer(lambda val: val if val is not None else ''),
]
and use it instead of Optional
.
Anyway I am thinking about changing None
default serialization format in the next release.
@skewty starting from 2.3.0 None value is encoded as an empty string.
All my model unit tests pass on 2.3.0. I went looking for a financial contribution method on the GitHub project page and didn't see one. Maybe checkout https://github.com/sponsors when you have some free time. Sincere thank you.
senderdata
will not parse correctly.