dapper91 / pydantic-xml

python xml for humans
https://pydantic-xml.readthedocs.io
The Unlicense
141 stars 14 forks source link

Reusing components for different namespaces #117

Closed maurosilber closed 9 months ago

maurosilber commented 9 months ago

This might be related to #91, if it is not the same issue, and I tried to follow the code with a debugger, but don't understand where the issue is.

I have some components that are reused on different models under different root namespaces. I thought I could reuse them as follows:

from __future__ import annotations
from pydantic_xml import BaseXmlModel, attr, element

class L2(BaseXmlModel, tag="L2"):
    value: str = attr()

class L1(BaseXmlModel, tag="L1"):
    name: str = attr()
    values: list[L2] = element()

class NoNamespace(BaseXmlModel, tag="base"):
    value: str = attr()
    elements: list[L1] = element()

Then, I define an identical Model to NoNamspace, with a different namespace:

class V1_Namespace(NoNamespace, tag="base", nsmap={"": "v1"}):
    pass

Now, I define a helper function that defines the following XML document:

<?xml version="1.0" encoding="UTF-8"?>
<base {namespace} value="foo">
    <L1 name="bar">
        <L2 value="1" />
        <L2 value="2" />
    </L1>
</base>

and parses it with the given cls:

def helper(namespace: str, cls: type[BaseXmlModel]):
    xml = fromstring(
        f"""<?xml version="1.0" encoding="UTF-8"?>
<base {namespace} value="foo">
    <L1 name="bar">
        <L2 value="1" />
        <L2 value="2" />
    </L1>
</base>
    """
    )
    return cls.from_xml_tree(xml)

If I use NoNamespace, it works:

print(helper("", NoNamespace))
# value='foo' elements=[L1(name='bar', values=[L2(value='1'), L2(value='2')])]

But if I add the v1 namespace and use the corresponding class, I get the following error:

print(helper('xmlns="v1"', V1_Namespace))
# pydantic_core._pydantic_core.ValidationError: 1 validation error for L1
# values
#  Field required [type=missing, input_value={'name': 'bar'}, input_type=dict]

But this only happens when using nested models. If I remove L2:

class L1(BaseXmlModel, tag="L1"):
    name: str = attr()
    # values: list[L2] = element()

and use the following XML:

<?xml version="1.0" encoding="UTF-8"?>
<base {namespace} value="foo">
    <L1 name="bar">
    </L1>
</base>

it works in both cases.

So, am I doing something wrong or is it not possible to reuse nested components?

By the way, thank you for this project! It has been really useful.

dapper91 commented 9 months ago

@maurosilber Hi

The problem is that xml default namespace is inherited by all sub-elements, so the document:

<?xml version="1.0" encoding="UTF-8"?>
<base xmlns="v1" value="foo">
    <L1 name="bar">
    </L1>
</base>

is equivalent to:

<?xml version="1.0" encoding="UTF-8"?>
<v1:basevalue="foo">
    <v1:L1 name="bar">
    </v1:L1>
</v1:base>

but sub-models do not inherit parent model namespace (because model serializer is build on model definition and can't be altered afterwards). In your example V1_Namespace is searched under v1 namespace but L1 is not.

So you have to declare your models without namespace and with namespace v1. To get rid of duplicate code you can use generic models:

from __future__ import annotations
from typing import Generic, TypeVar
from pydantic_xml import BaseXmlModel, attr, element
from xml.etree.ElementTree import fromstring

class L2(BaseXmlModel, tag="L2"):
    value: str = attr()

class L2_V1(L2, nsmap={"": "v1"}):
    pass

L2Type = TypeVar('L2Type', bound=L2)

class L1(BaseXmlModel, Generic[L2Type], tag="L1"):
    name: str = attr()
    values: list[L2Type] = element()

class L1_V1(L1, nsmap={"": "v1"}):
    pass

L1Type = TypeVar('L1Type', bound=L1)

class Namespace(BaseXmlModel, Generic[L1Type], tag="base"):
    value: str = attr()
    elements: list[L1Type] = element()

class NoNamespace(Namespace[L1[L2]]):
    pass

class V1_Namespace(Namespace[L1_V1[L2_V1]], tag="base", nsmap={"": "v1"}):
    pass
maurosilber commented 9 months ago

Is that a limitation of pydantic or could the namespace be also inherited in the model definition somehow?

I found an alternative by removing the namespace from the tag, but then I would need to add it again if I wanted to export it:

for e in tree.iter():
    e.tag = e.tag.removeprefix("{ns}")

For now, I think I'll go with the code duplication approach.

Thank you!