Closed dionmcm closed 1 year ago
Mentioned in #4
@dionmcm I don't understand the risk of using extension snapshots. Is this about not ending up with rows from the extension package that should no longer be effective because they have been superseded?
The recommendation to not use snapshots perhaps depends on the import strategy?
In one approach using deltas / full is safer because each version of the base edition and extension can be imported in effective time sequence to end up with the correct set of effective rows. This relies on importing the full history of each package.
In an alternative approach the base edition is already imported and an extension snapshot can be imported with a row filter that skips any rows that already exist with a greater effective time in the store.
I'm really not sure which approach is "easier" for implementers. The first requires full packages to be filtered by effective time and imported multiple times in the correct order. The second requires filtering based on existing rows. I suppose neither of these issues exist when using an edition snapshot!
Have I understood the issue and can you see any problem with second approach?
Yes @kaicode, the issue is really that an extension Snapshot (as opposed to an edition Snapshot) is really like a patch or delta that needs to be applied to something (and the risks involved in doing that correctly), rather than the edition Snapshot that is a stand alone, ready to use thing.
The issue is really the complication/risk of correctly applying the extension Snapshot to the base Snapshot and recalculating the correct row to keep for each component. In theory the "right" way to do that is using the MDRS from the extension and base edition to filter down to the right modules and rows, then find the latest state for each component within the MDRS module/effective time constraints. Basically the same as you'd do with a Full/Delta but applied to a subset of rows.
However assuming this has already been done correctly and you're mixing an extension Snapshot with the same base Snapshot package/version it was classified/calculated/created against, I think the approach of taking the row with the latest effective time as the merge conflict resolution strategy would work as a shortcut process based on those assumptions. @lawley do you see any issues with this?
The other edge case is if the base edition contains additional content not represented as dependencies in the MDRS (for example derivative content) whether this content is in or out of scope for the resultant extension Snapshot when merged with its base. Obviously calculating this based on the MDRS such content would be removed, whereas using the "keep the latest row per component" shortcut that sort of content would be retained...resulting in different sets of content. But I think this is a bigger current issue in our Snapshot calculation rules, it is just that an edition Snapshot resolves all of this explicitly.
I suppose the point is that
The concern with the extension Snapshot wasn't about what was easier for implementers but about what is safe/consistent to reproduce the same result. So whether the work required to use an extension Snapshot will be understood and implemented consistently and correctly by everyone implementing (existing and new over time) and perhaps edition Snapshots are a safer option.
This is a bit different for derivatives as opposed to extensions. Extensions are particularly prone to this complication because they need to change the state of components from "upstream" modules - for example retiring is-a relationships as part of NNF transitive reduction post classification. Derivatives just create their own new components that don't affect the state of "inherited" components so don't have this issue.
That's why I think derivatives work best as extension packages and extensions work best as edition packages.
There is no argument about extension snapshots vs edition snapshots - that is clear.
This PR and my question is about extension snapshots vs extension deltas/fulls - my point was that I don't think it's any harder to work with extension snapshots than extension fulls. I actually think extension snapshots are more concise and convenient than extension fulls.
I would vote for this PR to be reverted.
Sorry I got confused with another ticket in my last (now deleted) comment.
So I think you're right that this depends on import strategy, but I guess the point is that unless you know that it would be easy for an implementer to the simplest thing and get it wrong.
Not something that is a risk with edition snapshots because there's no special import strategy required. But, it feels lower risk with Delta/Full extension packages because they clearly need steps to ingest that anyone using would need to investigate to figure out what to do. The Snapshots appear simple enough that they present the risk that someone might not do that.
All of that said, I'm not wedded to this, particularly if there are edition packages available for these things and implementers are encouraged there first unless they know what they are doing. If the extension Snapshot doesn't exist it just feels like one less chance that someone will get it and make a mistake.
In an alternative approach the base edition is already imported and an extension snapshot can be imported with a row filter that skips any rows that already exist with a greater effective time in the store.
I think this is right, provided that you're applying it to the right base edition. If not, then the extension component versions may be overridden by newer base edition versions and mess things up. Obviously this doesn't happen with an edition Snapshot, but it also doesn't happen if you get an extension Full and apply it to a newer base edition Full and calculate the Snapshot using the MDRS. If you take this shortcut in that scenario you'll get a different result to applying the Full and calculating the Snapshot with the MDRS.
Thanks Dion. I understand your point of view now.
I think there is an almost equal risk of implementers consuming the delta or full packages incorrectly and not realising. Consuming any extension package types correctly will require some guidance. It seems the best we can do is to provide the simplest mechanism to get it right with the shortest and clearest explanation!
I believe using the snapshots is the best way to achieve this simplicity. I think in nearly all cases it's simple date comparison so I don't think that the MLDS is even required. The greatest effective-time wins.
The only possible scenario that I am aware of where the MLDS may be needed is when a row has been published twice in different modules but with the same effective time. I'm not sure how often this happens but it's likely that we can come up with a simple rule here. For example let the extension package win - again not really needing the MLDS, to simplify implementation.
Thoughts?
I've tried to express this and I'm not sure I've done that well. Happy to take suggestions on rewording