Mixed Content? - Githubissues

npmccallum commented 3 months ago

I'm not sure if this is a feature request, documentation request or a user question.

I have some XML like this:

<foo>
    first text
    <bar>second text</bar>
    third text
</foo>

How can I model this? The ordering of the text is significant. So I basically need something like:

class Bar(BaseXmlModel):
    body: str

class Foo(BaseXmlModel):
    body: list[str | Bar]

But of course that doesn't work. What can I do?

dapper91 commented 3 months ago

@npmccallum Hi,

"first text" and "second text" can be extracted like this:

from pydantic_xml import BaseXmlModel

class Bar(BaseXmlModel, tag='bar'):
    text: str

class Foo(BaseXmlModel, tag='foo'):
    text: str
    bar: Bar

foo = Foo.from_xml(xml)
assert foo.text == '\n    first text\n    '
assert foo.bar.text == 'second text'

Unfortunately element tails are not supported yet. The simplest solution right now to extract "third text" is using raw element:

from lxml.etree import _Element as Element
from pydantic_xml import BaseXmlModel, element

class Foo(BaseXmlModel, tag='foo', arbitrary_types_allowed=True):
    text: str
    bar: Element = element('bar')

    @property
    def bar_text(self):
        return self.bar.text

    @bar_text.setter
    def bar_text(self, text: str):
        self.bar.text = text

    @property
    def bar_tail(self):
        return self.bar.tail

    @bar_tail.setter
    def bar_tail(self, tail: str):
        self.bar.tail = tail

foo = Foo.from_xml(xml)
assert foo.text == '\n    first text\n    '
assert foo.bar_text == 'second text'
assert foo.bar_tail == '\n    third text\n'

npmccallum commented 3 months ago

@dapper91 Thanks for the quick response. My real use case is significantly more complex than the simple one I gave. I have dozens of child tags that are interspersed with text. So I really need something like list[str | TypeOne | TypeTwo ... TypeN]. Do you know how difficult this might be to implement?

dapper91 commented 3 months ago

@npmccallum I think it is possible to add support for element tails. The problem is that in xml parsers (etree, lxml) the tail text corresponds to a sub-element not to the root element, see. Considering your example the tail will be bound to Bar, not to Foo.

So the models will be described like this:

from pydantic_xml import BaseXmlModel

class Bar(BaseXmlModel, tag='bar'):
    text: str
    tail: str = tail()

class Foo(BaseXmlModel, tag='foo'):
    text: str
    bars: list[Bar]

foo = Foo.from_xml(xml)
assert foo.text == '\n    first text\n    '
assert foo.bars[0].text == 'second text'
assert foo.bars[0].tail == '\n    third text\n'
assert foo.bars[1].text == 'fourth text'
assert foo.bars[1].tail == '\n    fifth text\n'
# and so on

Will that be helpful?

dapper91 / pydantic-xml

Mixed Content? #176