dapper91 / pydantic-xml

python xml for humans
https://pydantic-xml.readthedocs.io
The Unlicense
141 stars 14 forks source link

How to sensibly represent 'nillable=True' XML elements #145

Closed jwfraustro closed 7 months ago

jwfraustro commented 7 months ago

Hi,

First off, thanks for the excellent library, I'm a big fan!

I'm hoping you can help me with a pain point I'm having trying to represent nillable XML elements in a model. I've tried to represent these elements a few different ways, and have only found one satisfactory, but not ideal, solution.

For a simple XSD definition like this:

<xs:complexType name="Job">
    <xs:element name="startTime", type="xs:dateTime", nillable="true" />
</xs:complexType>

I'd like to be able to instantiate a Job model with either a startTime value or None.

For example, consider:

Job(start_time=datetime.now()).to_xml()

# <Job>
#   <startTime>2021-01-01T00:00:00</startTime> #via isoformat
# </Job>

Job(start_time=None).to_xml()

# <Job>
#   <startTime xsi:nil="true" />
# </Job>

The way we were able to achieve this was with a class that (simplified) looks like this:

class NillableElement(BaseXmlModel, skip_empty=True, nsmap={"xsi": "http://www.w3.org/2001/XMLSchema-instance"}):
    """An element that can be 'nillable' in XML.

    If no value is provided, the element will be rendered as <element xsi:nil="true" />.
    """

    value: Optional[str] = None

    @computed_attr(name="nil", ns="xsi")
    def nil(self) -> Optional[str]:
        """If the value is None, return 'true'."""
        if self.value is None:
            return "true"
        else:
            return None

class Job(BaseXmlModel):
    start_time: Optional[NillableElement] = element(tag="startTime", default=NillableElement())

However, this means we lose the type hinting for start_time as a datetime object, and we have to access the value as job.start_time.value instead of job.start_time.

Another complicating issue, we have a custom datetime type with some extra validation logic, that inherits from datetime.datetime.

Unfortunately, as you know, comparison between simple types and models isn't supported, so we have to do something like this:

class CustomDatetime(datetime.datetime):
    ....

class CustomDatetimeElement(RootXmlModel[CustomDatetime]):
    ...

class Job(BaseXmlModel):
    start_time: Optional[Union[CustomDatetimeElement, NillableElement]] = element(tag="startTime", default=NillableElement())

Similiar issue as before (plus some others), but now we've got an ambiguity between needing to call job.start_time.value or job.start_time.root depending on whether the value has been set or not.

And even more complicating, because we're storing some of these models in a cache, we often have to roundtrip XML <-> Model <-> JSON and it's quite a bit of headache.

The ideal, but probably not possible, solution would be to pass a nillable=True argument to the element function, and have it handle something like this automatically...

class Job(BaseXmlModel):
    start_time: Optional[CustomDatetime] = element(tag="startTime", nillable=True)

All of this to say, is there a better way to represent nillable XML elements in a model? I've been tearing my hair out trying to find a consistent way to do this, and I'm hoping you might have an idea.

If it's useful, since it's open-source, you can see the models we're currently working on here: spacetelescope/vo-models

P.S. I featured your library in a talk I presented: Python TAP Implementation at MAST: Lessons Learned under "Serialization Woes" :) I'm really hoping more people start using this library, it's fantastic!

dapper91 commented 7 months ago

@jwfraustro Hi,

Thanks for your feedback!

Your solution can be improved using generics:

import datetime as dt
from typing import Optional, Generic, TypeVar
from pydantic_xml import BaseXmlModel, computed_attr, element

NillableType = TypeVar('NillableType')

class NillableElement(BaseXmlModel, Generic[NillableType], skip_empty=True, nsmap={"xsi": "http://www.w3.org/2001/XMLSchema-instance"}):
    value: Optional[NillableType] = None

    @computed_attr(name="nil", ns="xsi")
    def nil(self) -> Optional[str]:
        if self.value is None:
            return "true"
        else:
            return None

class Job(BaseXmlModel):
    start_time: NillableElement[dt.datetime] = element(tag="startTime")

in that case you don't lose the type hinting.

Your proposal with nillable=True parameter sounds reasonable. I think it is not very difficult to implement. I will try to implement it in the next release.

jwfraustro commented 7 months ago

Thanks for the help!

And yes, a feature like that would be excellent! Looking forward to that release!