dapper91 / pydantic-xml

python xml for humans
https://pydantic-xml.readthedocs.io
The Unlicense
141 stars 14 forks source link

False positive "Input should be a valid string" when empty string #171

Closed skewty closed 4 months ago

skewty commented 4 months ago

This Code

from pydantic_xml import RootXmlModel

class SenderAddress(RootXmlModel[str], tag="address"):
    pass

class SenderName(RootXmlModel[str], tag="name"):
    pass

class SenderLocation(RootXmlModel[str], tag="location"):
    pass

class SenderDataDef(RootXmlModel, tag="senderdata"):
    root: list[tuple[SenderAddress | None, SenderName | None, SenderLocation | None]]

SenderDataDef.from_xml(b"""<?xml version="1.0" encoding="UTF-8"?>
<senderdata>
    <address>400</address>
    <name></name>                <!-- this empty string causes the issue -->
    <address>401</address>
    <name>Bob Andersen</name>
    <address>402</address>
    <name>Hansi</name>
    <address>403</address>
    <name>George Lucas</name>
    <address>404</address>
    <name>Michael Jensen</name>
    <address>406</address>
    <name>406</name>
    <address>407</address>
    <name>Fenger</name>
    <address>408</address>
    <name>408</name>
    <address>410</address>
    <name>410</name>
</senderdata>
""")

Raises

pydantic_core._pydantic_core.ValidationError: 1 validation error for SenderDataDef
0.1
  [line -1]: Input should be a valid string [type=string_type, input_value=PydanticUndefined, input_type=PydanticUndefinedType]

Also Tried

I tried to use:

from pydantic import model_validator

    @model_validator(mode="before")
    @classmethod
    def _accept_empty_strings(cls, data: Any) -> Any:
        # catch the None and change it to "" here
        return data

but it doesn't run until after the error above.

Any ideas? (I believe this is an lxml "feature")

dapper91 commented 4 months ago

@skewty Hi,

The text of an empty element is parsed by the xml parser as None, not as an empty string:

>>> import lxml.etree
>>> root = lxml.etree.fromstring('<root><element></element></root>')
>>> root.find('element').text is None
True

but in your case SenderName is a non-optional string.

Therefore you should make it Optional:

class SenderName(RootXmlModel[Optional[str]], tag="name"):
    root: Optional[str] = None

or set a default value if you desire it to be a string anyway:

class SenderName(RootXmlModel[str], tag="name"):
    root: str = ''
skewty commented 4 months ago

Tried out your suggestion and came across this issue which was recently reported:

https://youtrack.jetbrains.com/issue/PY-70544/False-positive-UnionType-not-accepted-but-Optional-is.

Also, I see you are adding type hints to both the generic and root.

I saw, I believe in the docs, RootXmlModel being used without double typing as such:

class SenderName(RootXmlModel[Optional[str]], tag="name"):
    pass

Is providing the type for root field necessary?

This did the trick and solves my issues / makes it through my unit tests:

class SenderName(RootXmlModel[str], tag="name"):
    root: str = ''

class SenderDataDef(RootXmlModel, tag="senderdata"):
    root: list[tuple[SenderAddress | None, SenderName | None, SenderLocation | None]]

def test_request_system_info_many_from_xml():
    input_xml = Path("request-systeminfo-many.xml").read_bytes()
    request = models.SystemInfoRequest.from_xml(input_xml)
    assert len(request.sender_data) == 9
    assert request.sender_data[0].address == "400"
    assert request.sender_data[0].name == ""
    assert request.sender_data[-1].address == "410"
    assert request.sender_data[-1].name == "410"
    output_xml = request.to_xml(skip_empty=False)  # this must be false
    assert request == models.SystemInfoRequest.from_xml(output_xml)

Big thanks!

dapper91 commented 4 months ago

I saw, I believe in the docs, RootXmlModel being used without double typing as such:

class SenderName(RootXmlModel[Optional[str]], tag="name"):
    pass

Is providing the type for root field necessary?

Yes, you have to provide a type for the root field, but you may omit it for the class itself to get rid of double typing. So this should work too:

class SenderName(RootXmlModel, tag="name"):
    root: Optional[str] = None