Closed bodograumann closed 5 months ago
@bodograumann If you would like name
to be optional, you should type-hint it as Optional
. This follows the Pydantic
convention for optional fields.
For example, the following code achieves what you are looking for, with an example on how it works for countries with and without a name
value:
from typing import Optional
from pydantic_xml import BaseXmlModel, attr
country_w_name = """<Country code="GER">Germany</Country>"""
country_wo_name = """<Country code="GER"></Country>"""
class Country(BaseXmlModel):
name: Optional[str] = None # or ""
code: str = attr()
named_country = Country.from_xml(country_w_name)
unnamed_country = Country.from_xml(country_wo_name)
print(named_country.to_xml())
# <Country code="GER">Germany</Country>
print(unnamed_country.to_xml())
# <Country code="GER"></Country>
The point I wanted to make, @jwfraustro: it is not obvious whether in """<Country code="GER"></Country>"""
the content is missing, or whether it is the empty string. You could define it either way.
XML Schema seems to indicate that it should be interpreted as an empty string, while pydantic-xml currently interpretes it as missing.
So either pydantic-xml should also regard it as an empty string, or the documentation should be expanded to make it clearer for the user.
Btw:
Optional[T]
is usually written T | None
these days.Thank you for the linting advice; pipe unions are not always a valid option in the code I support. I was only hoping to offer you a solution to the problem at hand.
I see your point regarding xml considering it an empty string. I think you are probably correct in your interpretation.
I will note that the documentation does note that it regards unset values as None
. It is mentioned here: https://pydantic-xml.readthedocs.io/en/latest/pages/misc.html#optional-type-encoding
This section applies to encoding, but I wager pydantic-xml
presumes and behaves the same when parsing xml.
Is the answer to just default to an empty string like so:
from pydantic_xml import BaseXmlModel, attr
class Country(BaseXmlModel, skip_empty=True):
name: str = ""
code: str = attr()
country_w_name = """<Country code="GER">Germany</Country>"""
country_wo_name = """<Country code="GER"></Country>"""
named_country = Country.from_xml(country_w_name)
unnamed_country = Country.from_xml(country_wo_name)
print(named_country.to_xml())
print(unnamed_country.to_xml())
Output:
b'<Country code="GER">Germany</Country>'
b'<Country code="GER"></Country>'
@bodograumann Hi.
The library deserializer receives the element text from the underlying xml parser (lxml or xml.etree) without any changes. They both interpret an empty element text as None
not as an empty string:
>>> from lxml import etree
>>>
>>> xml = '<Country code="GER"></Country>'
>>>
>>> root = etree.fromstring(xml)
>>> print(root.text)
None
I will clarify that in the documentation.
added it to the documentation.
Thank you @dapper91 . I think this will be very helpful to newcomers.
There is some ambiguity, how an element with empty text content should be parsed.
For example I have the following schema:
Naively I would implement this with pydantic-xml as
This does not allow the
name
to be the empty string however. The following is a valid document according to the schema:However, when parsing with pydanic_xml, I get an error:
So I think we should either:
name: str = ""
is required, to allow the empty content.I am not sure which is the better option. The current implementation gives the user more options, but it is not in line with xml schema.