Closed bruno-f-cruz closed 6 days ago
Using the new codegen, if I just add Modality.OPHYS pointing to the _Pophys() model is that sufficient to support this?
Naively I would say no. Assuming I understood what you suggested, from a technical point of view, the abbreviation map would somewhat break since only the abbreviation of the last entry would be added
Modality.abbreviation_map = {m().abbreviation: m() for m in Modality.ALL}
At a more conceptual level, I also think that the biggest point is NOT making it compatible at the level of the python API but instead at the level of deserialization. In fact, I think we should slowly deprecate the use of OPHYS in the python API but support the deserialization of the OPHYS into POPHYS.
You can check the tests on the open PR to have a sense what cases I believe we should support. (https://github.com/AllenNeuralDynamics/aind-data-schema-models/pull/61/files#diff-a238cb580608af7c9b598c6b5bb3bacf973927b080ad7c8919e0ea104c1947cd)
Got it thanks for the clarification -- in that case it might be easier to just run an upgrader on all the old metadata and replace the variable wherever we can find it. Unless there are places that OPHYS was used outside of the metadata but that need to be deserialized into one of the metadata classes, then we do need to implement as you've done here.
Ye this was exactly the reason behind this PR. Other packages will reference models in this library. For instance, the watchdog was using it to allow users to separate data by modality. Unfortunately, it started with an old version of models package and some data couldn't be deserialized. We ended up implementing this fix at the level of the watchdog anyway since, for our use, https://github.com/AllenNeuralDynamics/aind-data-schema/issues/960 was the real killer and we have to wrap the literal enum class anyway.
tldr, if you think that only the metadata deserializer will need to worry about this, i wouldn't even add the extra field and leave it as is. If you want to ensure backwards compatibility of other tools that may have been using this field, I think the deserialization is the most important use case.
Waiting on #94
After poking at this a bit I'm not sure this is possible, the discriminated Union forces you to rely on the abbreviation field and after an hour of messing with this I can't find a way to get it to deserialize properly. The issue is that the ONE_OF
field only has a limited list of options. You can get around this by adding an additional class with abbreviation="ophys" and you can get it to pick up that class during deserialization, but then if you run a model_validator on top of that to swap ophys->pophys it fails because it already knows the type of the class at that point and it expects abbreviation: Literal["ophys"].
One way around this would be to implement the validators in classes that include a call to Modality.ONE_OF (rig, session, quality_control, and data_description). I think this isn't worth it when we can go back and replace old uses of "ophys" -> "pophys".
I'll open a ticket in aind-data-migration to migrate all previous uses of "ophys" -> "pophys" in the metadata.
If there's additional concerns with metadata that isn't stored in DocDB let me know
A recent change made the previous tag
ophys
incompatible with the newpophys
modality. In order to support previously serialized schemas, it would be nice to have some custom logic to coerce the previous definition to the current one.This coercion mechanism should ideally make things compatible at the level of the python API (e.g. calling
Modality.OPHYS
) and at the level of deserialization. In both cases, the output should coerce to the new Modality.POPHYS object.