Open alecandido opened 1 year ago
Much better, pydantic
seems to do it, so we might get it together with #177
https://www.attrs.org/en/stable/why.html
attrs
is another candidate, together with its companion cattrs
.
https://threeofwands.com/why-i-use-attrs-instead-of-pydantic/
This actually convinced me that we want Pydantic, rather than attrs, since we want to validate runcards. And if someone saves me the burden to write the validator (or part of it), I'm only grateful.
Hi, @AleCandido
In order to reduce the burden of maintenance of the serialization part, it is worth to consider external libraries, especially if they are popular enough.
This is a good idea especially if you’re interested in serialization of variadic generics from the new PEP 646. It has a lot of edge cases but I managed to cope with them in mashumaro that I would suggest you to try even if you have more common use cases.
@Fatal1ty another rather nasty type that requires some special care here is npt.NDArray
. Have you ever tested it?
@Fatal1ty another rather nasty type that requires some special care here is
npt.NDArray
. Have you ever tested it?
For non-standard types like this one a custom universal serialization strategy can be registered. But since npt.NDArray
isn't equal to np.ndarray
such a strategy should be registered for any scalar type at the moment:
class NDArraySerializationStrategy(SerializationStrategy):
def serialize(self, value: np.ndarray) -> str:
tmp_io = io.BytesIO()
np.save(tmp_io, value, allow_pickle=False)
return tmp_io.getvalue().hex()
def deserialize(self, value: str) -> np.ndarray:
tmp_io = io.BytesIO(bytes.fromhex(value))
return np.load(tmp_io, allow_pickle=False)
@dataclass
class C(DataClassDictMixin):
x: npt.NDArray[np.float64]
class Config(BaseConfig):
serialization_strategy = {
npt.NDArray[float64]: NDArraySerializationStrategy(),
}
This could be unhandy when you have a lot of scalar variations, so I'm thinking about allowing to set a strategy for the origin type — np.ndarray in this case.
a commit in an earlier attempt is https://github.com/NNPDF/eko/commit/c6c2fa667c4a750e2bdf10b0e0d1997f84772b27
At the moment, I wrote in #172 a mini-serialization library based on
dataclass
es, it is currently calledDictLike
.Now,
dataclass
already provides natively.asdict()
and.from_dict()
analagoue (i.e.MyDataClass(**mydict)
). This is the initial motivation to go down this way.It is optimal, since it allows us to go through serialization preserving type information, since it is in the runtime structure, and this is the new instance of the "input layer" currently implemented in yadism. The idea is to improve over and over input-checking relying on custom types definition, such that the internal library can make assumptions on those values.
Unfortunately, type hints and runtime classes are not the exact same thing, and proper serialization with generic type hints is complex (consider that a union is a type hint, not a class, including
MyType | None
, formerlytyping.Optional[MyType]
, and alsolist[int]
, formerlytyping.List[int]
). So, we already taking care of internal types is not coming at zero cost, especially because many features are introduced in later releases, and we need compatibility with py3.8 at the time of writing. This reflects the fact that types in Python are rather recent, with all their ecosystem (of whichdataclass
is part).In order to reduce the burden of maintenance of the serialization part, it is worth to consider external libraries, especially if they are popular enough. A good example would be lidatong/dataclasses-json, that provides a
@dataclass_json
decorator that is the equivalent of myDictLike
base class. It is also compatible with py3.7 and py3.6 (with backport of dataclasses, introduced in standard library in py3.7). Unfortunately, it doesn't look so lively.Other options are welcome.