This looks like a great package, providing a much saner way to interact with XML data; the documentation is complete and clear as well.
Primitive processors
Being able to use both classes and namedtuples is a very convenient, but I feel there's some duplication of info going on if you're using type annotations (as they've been added in recent Python version). To demonstrate what I mean:
from dataclasses import dataclass
import decxml as xml
@dataclass
class Extent:
xmin: float
ymin: float
xmax: float
ymax: float
extent_processor = xml.user_object("extent", Extent, [
xml.floating_point("xmin"),
xml.floating_point("ymin"),
xml.floating_point("xmax"),
xml.floating_point("ymax"),
]
I'm stating twice that the attributes should be floats. It's pretty straightforward to define a function which does this for you:
This is all you need for simple processors (for typing.NamedTuple as well, mutatis mutandis).
Aggregate processors
Aggregate processors are easy to include via recursion, although you probably want to encode the "aggregateness" somewhere. After some playing around, I find encoding it in the type to be most straightforward:
import abc
class Aggregate(abc.ABC):
pass
@dataclass
class Extent(Aggregate):
xmin: float
ymin: float
xmax: float
ymax: float
@dataclass
class SpatialData(Aggregate):
epsg: str
extent: Extent
def make_processor(datacls):
fields = []
for name, vartype in datacls.__annotations__.items():
if issubclass(vartype, Aggregate):
field = make_processor(vartype)
else:
xml_type = type_mapping[vartype]
field = xml_type(name)
fields.append(field)
return xml.user_object(datacls.__name__.lower(), datacls, fields)
spatialdata_processor = make_processor(SpatialData)
This provides a very concise way of defining (nested) data structures -- which I'd generally want to do anyway -- and turn them into XML processors with a single function call and adding a new base class (which can even be monkey-patched at runtime, if needed).
I'm not sure you'd really want to put this in declxml (see the trouble below), but I do think it's useful (and non-trivial) enough to maybe warrant a section in the documentation. What do you reckon?
Optional, List, etc
I haven't tried it yet, but I'm pretty sure you can use typing.Optional and typing.List to map to the declxml equivalents.
Hickups
There's some trouble due to with the fact that XML has a separation between attributes and elements. For the XML's I'm working with, I don't really see a reason to separate between attributes and elements (of course, neither does JSON, or TOML, etc.) But you need to encode it somehow, or it won't end up the in the right place of the XML. But I can solve in it a slightly hacky way, by (ab)using typing.Union:
from typing import Union
class Attribute(abc.ABC):
pass
@dataclass
class Example:
a: Union[Attribute, int]
b: int
c: int
example = Example(1, 2, 3)
To write an XML:
<example a=1>
<b>2</b>
<c>3</c>
</example>
We can check again by inspecting the annotations:
def is_union(vartype):
return hasattr(vartype, "__args__") and (vartype.__args__[0] is Attribute)
This shouldn't trip up any type checker, but it is clearly not quite intended use: you'll never provide an Attribute as the value.
There's more issues with the fact that sometimes you need to include names that aren't part of the dataclass or the namedtuple, e.g. an array in the xml, where every entry is tagged "item":
At any rate, you can just mix and match as needed: when everything's encoded in the dataclass or namedtuple, you can generate the processors automatically; if not, you just have to write a few extra lines or provide an explicit name.
Similarly, there's cases where aliases are required. In my case, I'm lowering class names and replacing underscores by dashes: so it's sorta implicitly defined. Stuff like this makes me think it might be smarter to let the user figure out the details of their idiosyncratic XML format, and provide a "base recipe" to help them along a little.
Or perhaps you see a better way that is nice and general?
Hi @gatkin,
This looks like a great package, providing a much saner way to interact with XML data; the documentation is complete and clear as well.
Primitive processors
Being able to use both classes and namedtuples is a very convenient, but I feel there's some duplication of info going on if you're using type annotations (as they've been added in recent Python version). To demonstrate what I mean:
I'm stating twice that the attributes should be floats. It's pretty straightforward to define a function which does this for you:
This is all you need for simple processors (for typing.NamedTuple as well, mutatis mutandis).
Aggregate processors
Aggregate processors are easy to include via recursion, although you probably want to encode the "aggregateness" somewhere. After some playing around, I find encoding it in the type to be most straightforward:
This provides a very concise way of defining (nested) data structures -- which I'd generally want to do anyway -- and turn them into XML processors with a single function call and adding a new base class (which can even be monkey-patched at runtime, if needed).
I'm not sure you'd really want to put this in
declxml
(see the trouble below), but I do think it's useful (and non-trivial) enough to maybe warrant a section in the documentation. What do you reckon?Optional, List, etc
I haven't tried it yet, but I'm pretty sure you can use typing.Optional and typing.List to map to the declxml equivalents.
Hickups
There's some trouble due to with the fact that XML has a separation between attributes and elements. For the XML's I'm working with, I don't really see a reason to separate between attributes and elements (of course, neither does JSON, or TOML, etc.) But you need to encode it somehow, or it won't end up the in the right place of the XML. But I can solve in it a slightly hacky way, by (ab)using typing.Union:
To write an XML:
We can check again by inspecting the annotations:
This shouldn't trip up any type checker, but it is clearly not quite intended use: you'll never provide an Attribute as the value.
There's more issues with the fact that sometimes you need to include names that aren't part of the dataclass or the namedtuple, e.g. an array in the xml, where every entry is tagged "item":
I can't use something as general as "item" as my class name. This how I want to see it in Python:
Of course, I can just fall back to regular use at any time, and provide the name which is only part of the processor, not of the dataclass:
At any rate, you can just mix and match as needed: when everything's encoded in the dataclass or namedtuple, you can generate the processors automatically; if not, you just have to write a few extra lines or provide an explicit name.
Similarly, there's cases where aliases are required. In my case, I'm lowering class names and replacing underscores by dashes: so it's sorta implicitly defined. Stuff like this makes me think it might be smarter to let the user figure out the details of their idiosyncratic XML format, and provide a "base recipe" to help them along a little.
Or perhaps you see a better way that is nice and general?