AllenNeuralDynamics / aind-data-schema

A library that defines AIND data schema and validates JSON.
MIT License
23 stars 16 forks source link

Schema version field prevents validating older versioned data #1099

Open dbirman opened 2 weeks ago

dbirman commented 2 weeks ago

User story

As a user I want to to be able to validate models when the underlying data is valid. I can't do this because old data (which might be valid under the current schema) will always fail to validate because of the schema_version field.

Would be great to get a solution built into the schema for this, possibly annotating the version field with https://docs.pydantic.dev/2.0/api/functional_validators/#pydantic.functional_validators.SkipValidation

Acceptance criteria

Sprint Ready Checklist

Notes

Removing the field doesn't work because it's required

dbirman commented 2 weeks ago

Actually it's impossible to validate even with the field removed, I think it has to get explicitly set to the correct value, I guess pulled from the main schema maybe?

bruno-f-cruz commented 2 weeks ago

Check here for a possible solution: Implementation: https://github.com/AllenNeuralDynamics/Aind.Behavior.Services/blob/9629b92a08878c8adf3b58de3240017dc43196b9/src/DataSchemas/aind_behavior_services/session/__init__.py#L27

Test: https://github.com/AllenNeuralDynamics/Aind.Behavior.Services/blob/main/tests/test_schema_version_coercion.py

The strategy is to assume that the only incompatibility is a field that is coercable. The value will be updated and the rest of the schema runs as is. If it crashes one assumes it is not compatible if it doesn't we assume it is. Once deserialized without throwing, you assume that the deserialized object is from the latest library version.