Open notamonad opened 4 weeks ago
It seems that currently deserialization logic https://github.com/dapper91/pydantic-xml/blob/ce20508122261879036288764d0d8d05450f4302/pydantic_xml/serializers/factories/model.py#L193
Will always return None for elements with empty values and as such will not add them to result dictionary in same function. Is there a way to hook into deserialization behavior for such fields? It seems to me like this should be a valid use case to be able to parse an empty element into a field with a value None
so as to be able to tell the difference between <someelement></someelement>
and <someotherelement></someotherelement>
@notamonad Hi,
The problem is that the text property of an empty element is None
(the behavior of the underlying library):
from lxml import etree
root = etree.fromstring("<root></root>")
assert root.text is None
The workaround is to define inner model like this:
from pydantic_xml import BaseXmlModel, element, wrapped
from pydantic import ConfigDict
class Payload(BaseXmlModel):
model_config = ConfigDict(extra='forbid')
class ResponsePayload(Payload):
response: str = element("response")
class Hello(BaseXmlModel, tag="protocol"):
"""
<protocol>
<hello></hello>
</protocol>
"""
hello: Payload = element("hello")
class HelloResponse(BaseXmlModel, tag="protocol"):
"""
<protocol>
<hello>
<response>ok</response>
</hello>
</protocol>
"""
response: ResponsePayload = element("hello")
class Bye(BaseXmlModel, tag="protocol"):
"""
<protocol>
<bye></bye>
</protocol>
"""
bye: Payload = element("bye")
hello = """
<protocol>
<hello></hello>
</protocol>
"""
hello_response = """
<protocol>
<hello>
<response>ok</response>
</hello>
</protocol>
"""
bye = """
<protocol>
<bye></bye>
</protocol>
"""
# correct behavior
Hello.from_xml(hello)
HelloResponse.from_xml(hello_response)
Bye.from_xml(bye)
print(HelloResponse.from_xml(hello)) # raises an Exception
print(Hello.from_xml(hello_response)) # raises an Exception
print(Hello.from_xml(bye)) # raises an Exception
print(Bye.from_xml(hello)) # raises an Exception
I am using pydantic_xml to parse and serialize custom protocol messages. As part of my parsing logic I register each valid protocol message into a list of supported messages with a decorator and during parsing iterate over each registered model, looking for first one that would not cause a ValidationError, then returning that model to the user.
I first encountered difficulty distinguishing Request/Response messages in which request contains an always empty element, and response contains that same element with a mandatory subelement. I seemed to have gotten past that by using min_length, max_length parameters. I then however run into issues distinguishing Hello/Bye messages where each message consists of empty elements with different tags, as shown in code snippet
What I want is to be able to only match hello to Hello model and bye to Bye model, yet either message can be matched to either model in this case. This seems counter intuitive to me since even though the element is empty one has a distinctive tag from another, yet both seem to work.
Is there a good strategy for having only 1 model get matched for every message in this case?