AllenNeuralDynamics / aind-data-schema-models

Data models used in aind-data-schema
MIT License
0 stars 1 forks source link

Consider supporting backwards compatiblity between `ophys` and `pophys` #62

Closed bruno-f-cruz closed 6 days ago

bruno-f-cruz commented 2 months ago

A recent change made the previous tag ophys incompatible with the new pophys modality. In order to support previously serialized schemas, it would be nice to have some custom logic to coerce the previous definition to the current one.

This coercion mechanism should ideally make things compatible at the level of the python API (e.g. calling Modality.OPHYS) and at the level of deserialization. In both cases, the output should coerce to the new Modality.POPHYS object.

dbirman commented 2 weeks ago

Using the new codegen, if I just add Modality.OPHYS pointing to the _Pophys() model is that sufficient to support this?

bruno-f-cruz commented 2 weeks ago

Naively I would say no. Assuming I understood what you suggested, from a technical point of view, the abbreviation map would somewhat break since only the abbreviation of the last entry would be added

Modality.abbreviation_map = {m().abbreviation: m() for m in Modality.ALL}

At a more conceptual level, I also think that the biggest point is NOT making it compatible at the level of the python API but instead at the level of deserialization. In fact, I think we should slowly deprecate the use of OPHYS in the python API but support the deserialization of the OPHYS into POPHYS.

You can check the tests on the open PR to have a sense what cases I believe we should support. (https://github.com/AllenNeuralDynamics/aind-data-schema-models/pull/61/files#diff-a238cb580608af7c9b598c6b5bb3bacf973927b080ad7c8919e0ea104c1947cd)

dbirman commented 2 weeks ago

Got it thanks for the clarification -- in that case it might be easier to just run an upgrader on all the old metadata and replace the variable wherever we can find it. Unless there are places that OPHYS was used outside of the metadata but that need to be deserialized into one of the metadata classes, then we do need to implement as you've done here.

bruno-f-cruz commented 2 weeks ago

Ye this was exactly the reason behind this PR. Other packages will reference models in this library. For instance, the watchdog was using it to allow users to separate data by modality. Unfortunately, it started with an old version of models package and some data couldn't be deserialized. We ended up implementing this fix at the level of the watchdog anyway since, for our use, https://github.com/AllenNeuralDynamics/aind-data-schema/issues/960 was the real killer and we have to wrap the literal enum class anyway.

tldr, if you think that only the metadata deserializer will need to worry about this, i wouldn't even add the extra field and leave it as is. If you want to ensure backwards compatibility of other tools that may have been using this field, I think the deserialization is the most important use case.

dbirman commented 1 week ago

Waiting on #94

dbirman commented 6 days ago

After poking at this a bit I'm not sure this is possible, the discriminated Union forces you to rely on the abbreviation field and after an hour of messing with this I can't find a way to get it to deserialize properly. The issue is that the ONE_OF field only has a limited list of options. You can get around this by adding an additional class with abbreviation="ophys" and you can get it to pick up that class during deserialization, but then if you run a model_validator on top of that to swap ophys->pophys it fails because it already knows the type of the class at that point and it expects abbreviation: Literal["ophys"].

One way around this would be to implement the validators in classes that include a call to Modality.ONE_OF (rig, session, quality_control, and data_description). I think this isn't worth it when we can go back and replace old uses of "ophys" -> "pophys".

I'll open a ticket in aind-data-migration to migrate all previous uses of "ophys" -> "pophys" in the metadata.

If there's additional concerns with metadata that isn't stored in DocDB let me know