Investigate if pydantic would be useful for metadata validation

ESapenaVentura commented 1 day ago

Currently, each subclass of metadata_entity defines a validate method.

This could be simplified by using pydantic, passing a subclass of BaseModel, but it would imply to define a data model per each new metadata entity

Is it worth it?

So far I can think of pros and cons:

Pros:

Only need to handle an InputDataError that would arise from the validation done by pydantic
Could open "validate" method to provide own data model (e.g. a modified or user-specific checklist)

Cons:

Need knowledge on how to set up a data model
pydantic, in the end, validates instantiated objects; that means that the current use of metadata entities is not really suited for this (As values are transformed on the fly and assigned to .entity via a very thorough __setitem__). What I want to say with this is: refactoring to setting entity as a e.g. ".entity = BiosampleModel(**metadata_content)" does not seem feasible or even valuable. Just seems valuable for validation
It's a new python object and, therefore, increases complexity. I wanted this library to be relatively simple...

ESapenaVentura commented 1 day ago

Let's set up a branch with this refactoring and decide

ESapenaVentura commented 17 hours ago

I am liking what I see so far - I can replace the validation functions specific to each metadata entity for future expansions (e.g. for EnaSubmissions) by creating the models. Also the validation functions allow for value refactoring, which I find it pretty useful for e.g. dates.

Now I can also provide with a Notebook on how to validate your own samples against checklists not in BSD! pretty cool

ESapenaVentura / biobroker

Investigate if pydantic would be useful for metadata validation #20