dapper91 / pydantic-xml

python xml for humans
https://pydantic-xml.readthedocs.io
The Unlicense
141 stars 14 forks source link

`Any` no longer a valid field type in v2 #100

Open Jacob-Flasheye opened 10 months ago

Jacob-Flasheye commented 10 months ago

(Disclaimer: This is my work account and I'm posting this on behalf of my $work.)

Hi.

I'm working on upgrading our code to v2, and the only pydantic_xml-related snag I've hit is that Any is no longer a valid type. I think I can work around this by using generics, but I'm curious as to why their support has been removed. Could you explain why Any is no longer a valid field type and if there are any plans to make it a valid field type again?

dapper91 commented 10 months ago

@Jacob-Flasheye Hi.

Could you please provide some examples of how you use Any typed fields? The reason it has been removed is that it is not possible to analyze the field type during building the model serializer and choose the correct serializer type (scalar, collection, mapping, ...).

In v1 Any typed fields were interpreted as scalar typed ones (pydantic 1 did that and the library relied on it) and raised an exception during serializaiton if they wasn't which lead to some unexpected behavior. For example:

class Model(BaseXmlModel):
    field: Any = element()

M(field=1).to_xml()  # Ok
M(field=[1, 2, 3]).to_xml()  # Error

whereas

class Model(BaseXmlModel):
    field: List[int] = element()

M(field=[1, 2, 3]).to_xml()  # Ok
Jacob-Flasheye commented 10 months ago

Thanks for the quick reply, appreciate your work :pray:

The problem at hand is generating pydantic_xml models from xml schema files, where I don't control the files. Some of the types in the schema contain my worst enemies: xs:any and xs:anyAttribute. Here is perhaps the most pathological example:

<xs:complexType name="AnyHolder">
    <xs:sequence>
        <xs:any namespace="##any" processContents="lax" minOccurs="0" maxOccurs="unbounded"/> 
    </xs:sequence>
    <xs:anyAttribute processContents="lax"/>
</xs:complexType>

In v1 I could do as you show in your first example, and while serialization errors out, I could still instantiate the model and do the serialization by hand. The problem is that I now cannot instantiate the object. I totally agree with you that the current behaviour is the better in 99% of circumstances.

What I was thinking (without knowing any of the pydantic internals) is that the serialization of an Any field could be special-cased to first look up the type of the field and then serialize it? But reading your reply I take it that such special casing or deferral is impossible.

I'm currently leaning towards representing the model as

class AnyHolder(Generic[ElemT, AttrT], BaseModel):  # I will admit I haven't looked up the specifics of generic models in pydantic v2
    any_elem: list[ElemT] = element()
    any_attr: AttrT = attr()

but technically that model is wrong (ElemT needs to be a TypeVarTuple, which requires support first from pydantic and then here, none of which are guarantees.) and I'm not sure that I cannot special case the code using the generated models.

One solution that would work well here (I think) is the suggestion to support raw xml in #14. I think that would cover xs:any but it might still not deal with xs:anyAttribute. Also, I'm not sure how technically feasible that is.

Sorry for the ranting nature of this post, I'm not frustrated with pydantic_xml (I love it!), but I am a bit frustrated with the xml schema we're working with... Please tell me if you need any more information!

dapper91 commented 10 months ago

@Jacob-Flasheye Hi,

Thanks for the thorough answer!

I am working on raw xml fields right now but I am not sure it will solve your pain with xs:any. You could define your model like this:

class AnyHolder(BaseXmlModel): 
    any_elem: list[etree.Element] = element()

but it will not fit your schema since the sequence element may have any tag.

Speaking of xs:anyAttribute I am not sure any_attr: AttrT = attr() definition is correct. As far as I know xs:anyAttribute means the element may have any number of attributes with any name. In your model only one attribute is allowed. So it seems to me this definition is more accurate:

class AnyHolder(BaseXmlModel):
    any_attrs: Dict[str, str]

Although raw xml still could help you. Is AnyHolder a root element in your schema? I will explain what I am getting at. It seems to me AnyHolder is weakly defined and it is not possible to define a model for that. Maybe you could use raw xml instead of the model itself:

class AnyHolder(BaseXmlElement, tag='AnyHolder'):
    ...

class OuterModel(BaseXmlModel, arbitrary_types_allowed=True):
    any_holder_raw: etree.Element = element(tag='AnyHolder', exclude=True)

    @computed_element
    def any_holder(self) -> AnyHolder:
        # manual parsing here

Will that help you to deal with AnyHolder?

Jacob-Flasheye commented 10 months ago

Wow, I didn't know anyAttribute also meant any number of attributes, thanks for informing me.

AnyHolder was really just the first example I could find but your suggestions still hold true for most other cases (in fact, AnyHolder is not referenced anywhere else in the schema...). But I agree that your posted solution would most likely help if I need to deal with AnyHolder or similar elements!

My current solution looks like this:

I think this issue can be closed, it seems you are aware of the problems I've mentioned and you've carefully given suggestions on what I can do. Now it's up to me to use those suggestions!

Jacob-Flasheye commented 6 months ago

There's one more thing to this that I hadn't thought of until I just now started generating xmlschema elements, and that is that they can have any name. IIUC pydantic_xml currently requires you to specify the exact name of the elements. Would it be possible to add some mechanism to capture all elements that aren't assigned irrespective of their name. I don't know exactly how that would work internally but it could be set through a class argument, with the elements accessed through some name like any_:

class CaptureUnassignedElements(BaseXmlModel, allow_arbitrary_types=True, capture_unassigned_elements=True):
    test: int = element("Test")

input_str = """
    <CaptureUnassignedElements>
        <test>23</test>
        <abc123>test</abc123>
        <empty/>
    </CaptureUnassignedElements>
"""

cua_test = CaptureUnassignedElements.from_xml(input_str)

print(cua_test.test)  # prints 23
print(cua_test.any_)  # print [lxml.etree._Element(...), lxml.etree._Element(...)] or something similar

I'm not sure if this is good design, nor how hard it is to do, but it does reflect a real use case caused by xmlschema so I think it should at least be considered.

jwfraustro commented 3 months ago

This is something that I'm bumping into as well, that I understand is frankly against the nature and benefit of these strongly defined models, but is a use case I'm nevertheless having to support.

The schema definition for the element is:

<xs:element name="jobInfo" maxOccurs="1" minOccurs="0">
    <xs:annotation>
        <xs:documentation> This is arbitrary information that can be added to the job description by
            the UWS implementation. </xs:documentation>
    </xs:annotation>
    <xs:complexType>
        <xs:sequence>
            <xs:any namespace="##any" processContents="lax" minOccurs="0" maxOccurs="unbounded" />
        </xs:sequence>
    </xs:complexType>
</xs:element>

which, in practice, might look like:

<uws:jobInfo>
  <any>
    <xml>
      <thatyouwant />
    </xml>
  </any>
</uws:jobInfo>

which I'm struggling to support.

I've documented some more of this in the issue tracker for the project here: https://github.com/spacetelescope/vo-models/issues/18

fleimgruber commented 3 months ago

@Jacob-Flasheye I also need to generate models from XSD files (which you described in https://github.com/dapper91/pydantic-xml/issues/100#issuecomment-1689387730) so would be interested in collaborating here.

@dapper91 would this be something that you would be willing to integrate? I know that XSD can be thorny, but that feature would be a natural fit for pydantic-xml.

edit: Could we maybe parially support https://docs.pydantic.dev/latest/concepts/models/#dynamic-model-creation in pydantic-xml so that we could use e.g. https://xmlschema.readthedocs.io/en/latest/usage.html#meta-schemas-and-xsd-sources for dynamic model creation? This task alone would not need support for serializing (e.g. https://github.com/dapper91/pydantic-xml/issues/92) so might be feasible?

dapper91 commented 1 month ago

I added dynamic model creation experimental support in 2.10.0.