dapper91 / pydantic-xml

python xml for humans
https://pydantic-xml.readthedocs.io
The Unlicense
161 stars 16 forks source link

Error while parsing wrapped attribute #218

Open ghilesmeddour opened 1 week ago

ghilesmeddour commented 1 week ago

Hi @dapper91 again :blush:,

I'm trying to use the lib to parse HL7 aECG files.

Let's take this file as an example.

I have the following program:

from typing import Optional
from urllib.request import urlopen
from uuid import UUID

from pydantic_xml import BaseXmlModel, attr, element, wrapped

class EffectiveTime(BaseXmlModel, tag="effectiveTime", search_mode="unordered"):
    center: str = wrapped("center", attr(name="value"))

class Code(BaseXmlModel, tag="code", search_mode="unordered"):
    code: str = attr()
    code_system: str = attr(name="codeSystem")
    code_system_name: Optional[str] = attr(name="codeSystemName", default=None)
    display_name: Optional[str] = attr(name="displayName", default=None)

class AnnotatedECG(BaseXmlModel, nsmap={"": "urn:hl7-org:v3"}, search_mode="unordered"):
    id: UUID = wrapped("id", attr(name="root"))
    code: Code
    text: Optional[str] = element(default=None)
    effective_time: EffectiveTime

    # This is working
    # center: str = wrapped("effectiveTime/center", attr(name="value"))

file_url = "https://raw.githubusercontent.com/FDA/aecg-python/refs/heads/main/src/aecg/data/hl7/2003-12%20Schema/example/Example%20aECG.xml"

with urlopen(file_url) as f:
    xml_doc = f.read()

aecg_o = AnnotatedECG.from_xml(xml_doc)

print(aecg_o)

And I have the following error:

pydantic_core._pydantic_core.ValidationError: 1 validation error for AnnotatedECG
effective_time.center
  [line -1]: Field required [type=missing, input_value={}, input_type=dict]

I can't understand why the error occurs (especially since the commented line, which is equivalent for me, works correctly). What am I doing wrong?

dapper91 commented 1 week ago

@ghilesmeddour Hi,

Namespaces are not inherited by nested models, so they must be explicitly defined for all models:

from typing import Optional
from uuid import UUID

from pydantic_xml import BaseXmlModel, attr, element, wrapped

NSMAP = {"": "urn:hl7-org:v3"}

class EffectiveTime(BaseXmlModel, tag="effectiveTime", nsmap=NSMAP, search_mode="unordered"):
    center: str = wrapped("center", attr(name="value"))

class Code(BaseXmlModel, tag="code", nsmap=NSMAP, search_mode="unordered"):
    code: str = attr()
    code_system: str = attr(name="codeSystem")
    code_system_name: Optional[str] = attr(name="codeSystemName", default=None)
    display_name: Optional[str] = attr(name="displayName", default=None)

class AnnotatedECG(BaseXmlModel, nsmap=NSMAP, search_mode="unordered"):
    id: UUID = wrapped("id", attr(name="root"))
    code: Code
    text: Optional[str] = element(default=None)
    effective_time: EffectiveTime

    # This is working
    # center: str = wrapped("effectiveTime/center", attr(name="value"))
ghilesmeddour commented 6 days ago

Thank you very much @dapper91 for your help.

I'll take the opportunity to ask another question. The input files can have different namespaces ([None, "urn:hl7-org:v1", "urn:hl7-org:v2", "urn:hl7-org:v3"]) and I want to parse them the same way. Is there any way to do this transparently, to ignore namespaces completely or to specify a namespace at parsing time (as a from_xml parameter for example) and not at models definition.