Closed dwhswenson closed 10 months ago
All modified and coverable lines are covered by tests :white_check_mark:
Comparison is base (
f87e236
) 99.18% compared to head (42565aa
) 99.20%.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
I agree wrt the rearrangement case, it sounds like this can be done by the tokenizable that holds Settings?
Yes; there's a test that serves as example in 0161938. This is also the downside of this approach: Any change we make to common settings will need to need to be changed in all protocols. And since it is the version of that parent object that matters, there's not much we can do within this approach to make it easier (a settings change that happens in between v2 and v3 OpenFE RBFE protocol could affect v1 to v2 in a Perses non-eq protocol; so the version number isn't the same and can't be used in for migration in a superclass).
Basically, think about the tree structure we have of objects nested in other objects: Protocol, Settings, Subsettings. For a reorganizational change, the version has to be changed somewhere on the tree that knows about both the "move from" and "move to" components -- Protocol knows everything, but earlier parts of the settings tree also might know both.
Now color each of the nodes in that tree by who owns them. Because we don't have a strict separation of ownership at some level of the tree (some settings/subsettings are gufe, some are protocol dev; Protocol is owned by protocol dev) what happens is that any reorganization change we make could affect versioning of something that we don't own.
In general, we've discussed the need to redesign settings in the future. I think we're going to have to settle for an imperfect solution here, include these issues in our settings redesign.
@dwhswenson give me a ping when this is ready for review
@richardjgowers : This should be ready for review.
Additional notes on changes here:
':version:'
key is considered as a "default value" for serialization purposes. Since it is constant within a version of the code, this makes sense to me. This also made it very easy to preserve gufe keys for most objects.LigandNetwork
s are not preserved. The reason for this is that the ligand network _to_dict
explicitly includes the full serialization of the ligands. This includes the version number for the SMC serialization. We could preserve the gufe key now by not including the version (doing that was -- you guessed it -- about 4 lines of code) but the downside of that approach was that a change to the serialization format of SmallMoleculeComponent
would then need us to also change the version of the LigandNetwork
serialization. With this approach, it should be easier to just use the SMC serialization versions to deserialize.Hello @dwhswenson! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
gufe/tests/test_serialization_migration.py
:Line 25:23: E203 whitespace before ':' Line 29:1: E302 expected 2 blank lines, found 1 Line 38:1: E302 expected 2 blank lines, found 1 Line 39:20: E203 whitespace before ':' Line 52:1: E302 expected 2 blank lines, found 1 Line 84:26: E261 at least two spaces before inline comment Line 175:1: E302 expected 2 blank lines, found 1
gufe/tokenization.py
:Line 159:1: E302 expected 2 blank lines, found 1 Line 192:1: E302 expected 2 blank lines, found 1 Line 217:1: E302 expected 2 blank lines, found 1 Line 230:51: W605 invalid escape sequence '.' Line 230:54: W605 invalid escape sequence '[' Line 230:57: W605 invalid escape sequence ']' Line 235:1: E302 expected 2 blank lines, found 1 Line 247:1: E302 expected 2 blank lines, found 1 Line 259:1: E302 expected 2 blank lines, found 1
@richardjgowers : This is ready for another look. Changes you requested were all docstrings/typing, so this should be good.
Yeah, maybe schema_version
? (I would normally prefer serialization_version
, but since spelling disagreements might lead to confusion....) 😆
Yeah schema version works for me
On Fri, 3 Nov 2023 at 17:28, David W.H. Swenson @.***> wrote:
Yeah, maybe schema_version? (I would normally prefer serialization_version, but since spelling disagreements might lead to confusion....) 😆
— Reply to this email directly, view it on GitHub https://github.com/OpenFreeEnergy/gufe/pull/227#issuecomment-1792852842, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACGSGB4DJZSBJOQ5HTSN453YCUSTLAVCNFSM6AAAAAA4MUGK2CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJSHA2TEOBUGI . You are receiving this because you were mentioned.Message ID: @.***>
@dwhswenson looks like we have consensus on _schema_version
moving forwards
GufeTokenizableTestMixin
to tests of migration processesWe decided that
GufeTokenizable
s need to be versioned, and that we need to provide a migration path to load old serialized representations of theGufeTokenizable
. This PR provides the core functionality to do that.Changes to
GufeTokenizable
_version
: Should return a sortable marking the serialization version.serialization_migration
: Takes a dict (possibly from another version of serialization, but otherwise suitable to be ingested byGufeTokenizable._from_dict
) and a version, and converts the dict to ensure that it can be handled by the current version of the code.Changes to the serialized representation of
GufeTokenizable
sNow serialized versions of
GufeTokenizable
s (described as a Python dict/JSON object) will carry the additional key":version:"
, which describes the version. Serialized representations without that key will be treated as if its value was1
. Like__module__
and__qualname__
, this is added/popped without intervention from contributors.Labelling versions
I'm proposing we use versions that are simple increasing integers. In principle, users could use anything. However, I think that semantic versioning has no meaning here (the only purpose would be to distinguish breaks that can no longer be handled by the code, which is probably as easily handled with a simple
version < oldest_supported_version
). Tying to the version of the package also makes it more tedious to handle updates: first, this means that it requires more careful attention to the version each new thing was added in. Second, this is a nightmare for dev installs, because they lie about version number.On reorganizing keys within nested structures
Although we identified this as target, I don't think this will be easily done with general
GufeTokenizable
s. The issue is that, in deserialization, we can use gufe keys to shortcut through the tree of nested object, gaining some performance. However, if we want to rearrange the tree of the nested structure, we need to reconstruct the entire structure, and the make the modifications. This is a significantly different process than the one we use now, and may be slower.That said, moving an attribute within a digraph is probably an indication that the original design was significantly flawed, which is not something I think we need to worry about with
GufeTokenizable
s. For example, I have a hard time imagining that we would want to transferTransformation.protocol.settings
toTransformation.chemical_systems.protocol_settings
or transfer a hypotheticalAlchemicalNetworkCreator.mapper.timeout
toAlchemicalNetworkCreator.timeout
.Where this will matter is in
Settings
, where we expect rapid flux in the near future. However,Settings
are notGufeTokenizable
s. Therefore, their "key-encoded" representation is still a nested dict, and the parentGufeTokenizable
(theProtocol
) can manage migrations. As of opening this PR, tools to facilitate that haven't been added; that will be a future step, if this PR isn't nixed.With this PR, changes in settings will require changes in all parent protocols. We could additionally create similar versioning of
Settings
, which would duck-type the same as the approach here, but build on tools within the JSON codecs instead of the coreGufeTokenizable
. This can be done, but would be part of a separate PR. That said, a change in organization (moving an attribute from cousin to cousin) will always require that the correction be handled at a common ancestor (grandparent). Purely local changes are not possible for a change that is fundamentally nonlocal.Sequential migrations
This isn't illustrated here, but the pattern to use is actually something like this:
This does a series of sequential migrations, and continues to support version 1, 2, and 3 with minimal code. If we find that the chain of versions gets too long, it might be worth creating some skips in there, but I kind of doubt it will ever be worth going to the "correct" solution of creating the directed network and finding the minimal path.