dapper91 / pydantic-xml

python xml for humans
https://pydantic-xml.readthedocs.io
The Unlicense
141 stars 14 forks source link

[Question] Distinguishing between models with empty elements #196

Open notamonad opened 4 weeks ago

notamonad commented 4 weeks ago

I am using pydantic_xml to parse and serialize custom protocol messages. As part of my parsing logic I register each valid protocol message into a list of supported messages with a decorator and during parsing iterate over each registered model, looking for first one that would not cause a ValidationError, then returning that model to the user.

I first encountered difficulty distinguishing Request/Response messages in which request contains an always empty element, and response contains that same element with a mandatory subelement. I seemed to have gotten past that by using min_length, max_length parameters. I then however run into issues distinguishing Hello/Bye messages where each message consists of empty elements with different tags, as shown in code snippet

What I want is to be able to only match hello to Hello model and bye to Bye model, yet either message can be matched to either model in this case. This seems counter intuitive to me since even though the element is empty one has a distinctive tag from another, yet both seem to work.

from pydantic_xml import BaseXmlModel, element, wrapped

class Hello(BaseXmlModel, tag="protocol"):
    """
    <protocol>
        <hello></hello>
    </protocol>
    """
    hello: str = element("hello", default="", min_length=0, max_length=0)

class HelloResponse(BaseXmlModel, tag="protocol"):
    """
    <protocol>
        <hello>
            <response>ok</response>
        </hello>
    </protocol>
    """
    response: str = wrapped("hello", element("response"))

class Bye(BaseXmlModel, tag="protocol"):
    """
    <protocol>
        <bye></bye>
    </protocol>
    """
    bye: str = element("bye", default="", min_length=0, max_length=0)

hello = """
    <protocol>
        <hello></hello>
    </protocol>
"""

hello_response = """
    <protocol>
        <hello>
            <response>ok</response>
        </hello>
    </protocol>
"""

bye = """
    <protocol>
        <bye></bye>
    </protocol>
"""

# correct behavior
Hello.from_xml(hello)
HelloResponse.from_xml(hello_response)
Bye.from_xml(bye)

# unexpected
Hello.from_xml(bye)
Bye.from_xml(hello)

Is there a good strategy for having only 1 model get matched for every message in this case?

notamonad commented 3 weeks ago

It seems that currently deserialization logic https://github.com/dapper91/pydantic-xml/blob/ce20508122261879036288764d0d8d05450f4302/pydantic_xml/serializers/factories/model.py#L193

Will always return None for elements with empty values and as such will not add them to result dictionary in same function. Is there a way to hook into deserialization behavior for such fields? It seems to me like this should be a valid use case to be able to parse an empty element into a field with a value None so as to be able to tell the difference between <someelement></someelement> and <someotherelement></someotherelement>

dapper91 commented 4 days ago

@notamonad Hi,

The problem is that the text property of an empty element is None (the behavior of the underlying library):

from lxml import etree

root = etree.fromstring("<root></root>")
assert root.text is None

The workaround is to define inner model like this:

from pydantic_xml import BaseXmlModel, element, wrapped
from pydantic import ConfigDict

class Payload(BaseXmlModel):
    model_config = ConfigDict(extra='forbid')

class ResponsePayload(Payload):
    response: str = element("response")

class Hello(BaseXmlModel, tag="protocol"):
    """
    <protocol>
        <hello></hello>
    </protocol>
    """
    hello: Payload = element("hello")

class HelloResponse(BaseXmlModel, tag="protocol"):
    """
    <protocol>
        <hello>
            <response>ok</response>
        </hello>
    </protocol>
    """
    response: ResponsePayload = element("hello")

class Bye(BaseXmlModel, tag="protocol"):
    """
    <protocol>
        <bye></bye>
    </protocol>
    """
    bye: Payload = element("bye")

hello = """
    <protocol>
        <hello></hello>
    </protocol>
"""

hello_response = """
    <protocol>
        <hello>
            <response>ok</response>
        </hello>
    </protocol>
"""

bye = """
    <protocol>
        <bye></bye>
    </protocol>
"""

# correct behavior
Hello.from_xml(hello)
HelloResponse.from_xml(hello_response)
Bye.from_xml(bye)

print(HelloResponse.from_xml(hello))  # raises an Exception
print(Hello.from_xml(hello_response))  # raises an Exception
print(Hello.from_xml(bye))   # raises an Exception
print(Bye.from_xml(hello))   # raises an Exception