dapper91 / pydantic-xml

python xml for humans
https://pydantic-xml.readthedocs.io
The Unlicense
141 stars 14 forks source link

Problem when parsing XML with namespaced elements deeper than level 2 #177

Closed samholvi closed 3 months ago

samholvi commented 3 months ago

Hi,

First of all thank you for a great software! And then I experience some problem with parsing XML document with elements with namespace. So I have that code and models (`test.py'):

import pathlib
import sys

from pydantic_xml import BaseXmlModel

class SomethingElse3(BaseXmlModel, tag='SomethingElse3'):
    SomethingElse4: str

class SomethingElse2(BaseXmlModel, tag='SomethingElse2'):
    SomethingElse3: SomethingElse3

class Something(
    BaseXmlModel,
    tag="SomethingElse",
    ns="something",
    nsmap={"something": "urn:something:something:v1"},
):
    SomethingElse2: SomethingElse2

def main(input_file):
    xml_doc = pathlib.Path(input_file).read_text()
    something = Something.from_xml(xml_doc)
    print(something.to_xml())

if __name__ == "__main__":
    main(sys.argv[1])

And I have 2 XML files. input_ok.xml

<?xml version="1.0" encoding="UTF-8"?>
<something:SomethingElse
    xmlns:something="urn:something:something:v1"
>
    <something:SomethingElse2>
        <SomethingElse3>
            ABC
        </SomethingElse3>
    </something:SomethingElse2>
</something:SomethingElse>

and input_error_xml

<?xml version="1.0" encoding="UTF-8"?>
<something:SomethingElse
    xmlns:something="urn:something:something:v1"
>
    <something:SomethingElse2>
        <something:SomethingElse3>
            ABC
        </something:SomethingElse3>
    </something:SomethingElse2>
</something:SomethingElse>

The only difference between those is that the SomethingElse3 element is namespaces in input_error_xml

Executing python test.py input_ok.xml works ok and gives me output:

b'<something:SomethingElse xmlns:something="urn:something:something:v1"><something:SomethingElse2><SomethingElse3>\n            ABC\n        </SomethingElse3></something:SomethingElse2></something:SomethingElse>'

But executing python test.py input_error.xml gives and error below:

Traceback (most recent call last):
  File ".../test.py", line 31, in <module>
    main(sys.argv[1])
  File ".../test.py", line 26, in main
    something = Something.from_xml(xml_doc)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../venv/lib/python3.11/site-packages/pydantic_xml/model.py", line 402, in from_xml
    return cls.from_xml_tree(etree.fromstring(source), context=context)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../venv/lib/python3.11/site-packages/pydantic_xml/model.py", line 379, in from_xml_tree
    ModelT, cls.__xml_serializer__.deserialize(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../venv/lib/python3.11/site-packages/pydantic_xml/serializers/factories/model.py", line 201, in deserialize
    raise utils.build_validation_error(title=self._model.__name__, errors_map=field_errors)
pydantic_core._pydantic_core.ValidationError: 1 validation error for Something
SomethingElse2.SomethingElse3
  [line -1]: Field required [type=missing, input_value={}, input_type=dict]

So if I introduce any element fro level 3 with namespace the error will arise. I tried few different ways of defining models and model fields, with namespaces and not. But nothing helped. Maybe someone is able to figure out is that something wrong with how I define models or is there a bug.

===

Also, the same behaviour works other way, i.e. when generating an XML from model. test2.py

from pydantic_xml import BaseXmlModel

class SomethingElse3(BaseXmlModel, tag='SomethingElse3'):
    SomethingElse4: str

class SomethingElse2(BaseXmlModel, tag='SomethingElse2'):
    SomethingElse3: SomethingElse3

class Something(
    BaseXmlModel,
    tag="SomethingElse",
    ns="something",
    nsmap={"something": "urn:something:something:v1"},
):
    SomethingElse2: SomethingElse2

class Container(BaseXmlModel):
    something: Something

something = Something(
    SomethingElse2=SomethingElse2(
        SomethingElse3=SomethingElse3(
            SomethingElse4="ABC"
        )
    )
)

print(something.to_xml())

Executing python test2.py gives,. where SomethingElse3 is also without namespace:

b'<something:SomethingElse xmlns:something="urn:something:something:v1"><something:SomethingElse2><SomethingElse3>ABC</SomethingElse3></something:SomethingElse2></something:SomethingElse>'
dapper91 commented 3 months ago

@samholvi Hi,

Namespaces and namespace maps are not inherited by submodels. In your case SomethingElse3 doesn't have any namespace, it must be declared explicitly for each model:

from pydantic_xml import BaseXmlModel

NSMAP = {
    "something": "urn:something:something:v1",
}

class SomethingElse3(BaseXmlModel, tag='SomethingElse3', ns="something"):
    SomethingElse4: str

class SomethingElse2(BaseXmlModel, tag='SomethingElse2',  ns="something", nsmap=NSMAP):
    SomethingElse3: SomethingElse3

class Something(BaseXmlModel, tag="SomethingElse", ns="something", nsmap=NSMAP):
    SomethingElse2: SomethingElse2

something = Something(
    SomethingElse2=SomethingElse2(
        SomethingElse3=SomethingElse3(
            SomethingElse4="ABC"
        )
    )
)

print(something.to_xml(pretty_print=True).decode())
<something:SomethingElse xmlns:something="urn:something:something:v1">
  <something:SomethingElse2>
    <something:SomethingElse3>ABC</something:SomethingElse3>
  </something:SomethingElse2>
</something:SomethingElse>
samholvi commented 3 months ago

Hi! Thank you, that works! I definitely tried many combinations of parameters, but didn't think it could be missing nsmap as it works for second level elements without specifying nsmap but not for 3-rd and following. Also what maybe confused me is that part of doc "Xml default namespace is a namespace that is applied to the element and all its sub-elements without explicit definition." from https://pydantic-xml.readthedocs.io/en/latest/pages/misc.html#default-namespace. Made me think the namespace will be inherited. But the issue is not an issue anymore then. Thank you.