dapper91 / pydantic-xml

python xml for humans
https://pydantic-xml.readthedocs.io
The Unlicense
141 stars 14 forks source link

Modeling mappings as child elements? #179

Open zygi opened 2 months ago

zygi commented 2 months ago

Imagine I have the following xml:

<article>
<title>Hello</title>
<metadata>
<md_key_1>text_content_1</md_key_1>
<md_key_2>text_content_2</md_key_2>
...
</metadata>
</article>

That is, the metadata consists of a dynamic number of elements with dynamic tags and no attributes, each of which contains just text.

Ideally, I would want this to map to the Python model

class Article:
  title: str
  metadata: Dict[str, str]

is there a way to achieve that with pydantic-xml? The closest I got so far was by making metadata a raw field, but then working from the Python side gets a little annoying: how do I construct an instance of Article when metadata is ET.Element? I could create a new constructor class method but then I'd have to remember that for this specific model only, I shouldn't use the constructor.

The other approach I expected to work was setting metadata=Field(exclude=True) and implementing a @computed_element for serialization, and a @field_validator for deserialization. Unfortunately the @field_validator approach doesn't work:

class Article(BaseXmlModel, tag="article"):
    title: str
    metadata: Dict[str, str] = Field(exclude=True)

    @field_validator('metadata', mode='before')
    def decode_content(cls, value: Any) -> Optional[Dict[str, str]]:
        print(value)
        assert False

if __name__ == "__main__":
    TEST_INPUT = """\
<article>
<title>Hello</title>
<metadata>
<md_key_1>text_content_1</md_key_1>
<md_key_2>text_content_2</md_key_2>
</metadata>
</article>
"""
    Article.from_xml(TEST_INPUT)

prints

  [line -1]: Assertion failed,  [type=assertion_error, input_value={}, input_type=dict]

i.e. the validator receives an empty dict, not anything that could reconstruct the inner fields.

Is there a currently supported approach that I'm missing?

Thanks!

dapper91 commented 2 days ago

@zygi Hi,

Right now there is not way to model an element with dynamic tags. The workaround I see is the following:

from typing import Any

from lxml import etree

from pydantic_xml import BaseXmlModel, element
from pydantic import model_validator

class Article(BaseXmlModel, tag="article", arbitrary_types_allowed=True):
    title: str = element()
    metadata_raw: etree._Element = element(tag='metadata', default=None)

    @property
    def metadata(self) -> dict[str, str]:
        return {el.tag: el.text for el in self.metadata_raw}

    @model_validator(mode='before')
    @classmethod
    def set_metadata_raw(cls, data: dict) -> dict:
        if metadata := data.pop('metadata', None):
            data['metadata_raw'] = metadata_raw = etree.Element('metadata')
            for tag, text in metadata.items():
                sub = etree.SubElement(metadata_raw, tag)
                sub.text = text

        return data

if __name__ == "__main__":
    TEST_INPUT = """\
<article>
<title>Hello</title>
<metadata>
<md_key_1>text_content_1</md_key_1>
<md_key_2>text_content_2</md_key_2>
</metadata>
</article>
"""
    article = Article.from_xml(TEST_INPUT)
    print(article)
    print(article.metadata)
    print(article.to_xml().decode())

    article = Article(title='Hello', metadata={'md_key_1': 'text_content_1', 'md_key_2': 'text_content_2'})
    print(article)
    print(article.metadata)
    print(article.to_xml().decode())

output:

title='Hello' metadata_raw=<Element metadata at 0x1057376c0>
{'md_key_1': 'text_content_1', 'md_key_2': 'text_content_2'}
<article><title>Hello</title><metadata>
<md_key_1>text_content_1</md_key_1>
<md_key_2>text_content_2</md_key_2>
</metadata>
</article>
title='Hello' metadata_raw=<Element metadata at 0x105811b80>
{'md_key_1': 'text_content_1', 'md_key_2': 'text_content_2'}
<article><title>Hello</title><metadata><md_key_1>text_content_1</md_key_1><md_key_2>text_content_2</md_key_2></metadata></article>