Open ricroberts opened 6 years ago
Suggested initial implementation:
storage-graphs
endpoints
(called draftsets
in the current api) have a name, description etc.graph-sets
to collect storage graphs together into 'union graphs' used by endpoints
graph-sets
also provide public names for the graphs in that union graph.Imagine a live endpoint with 3 publicly named graphs, A, B, and C:
live (L)
|
ABC
The graphset for this endpoint maps named graphs to storage graphs
A -> S1
B -> S2
C -> S3
When we query the live endpoint with the public named graphs, it will rewrite the queries and results so we actually query the storage graphs.
Now imagine a scenario where someone makes a change to Graph A (e.g. by appending some data)
L Draftset1
^ ^
| |
| /
|/
ABC A'BC
As we do in the current version, we make a copy of A at the point of appending the new data into a new storage graph.
The graphset for this draftset
A -> S4
B -> S2
C -> S3
To publish the change to this draftset to live, we just update the graphset used for Live to be the above. No copying of data is required
L
^
|
A'BC
|\
| \
| |
| /
|/
ABC
After the change, Live, has the changed verion of Graph A
Now imagine that this scenario is made more complicated by someone else making a change to graph B in Draftset2 shortly after Draftset 1 was created:
Draftset2 L Draftset1
^ ^ ^
| | |
AB'C \ | |
\| / A'BC
|/
ABC
The graphset for Live
A -> S1
B -> S2
C -> S3
The graphset for Draftset1
A -> S4
B -> S2
C -> S3
The graphset for Draftset2
A -> S1
B -> S5
C -> S3
Depending on who merges/publishes first to Live, then Live will either miss the changes for Graph B, or the changes for Graph A.
There are 2 options for resolving conflicts:
OPTION 1) We could keep the client behaviour as it is now, by making graph sets inherit/cascade non-changed graph mappings from their parent endpoint's graphset (i.e. Live in our case).
The disadvantage of this is that (like now) changes can be silently inherited from live which might break your draftset and you not notice.
This will mean that if Drafset 2 was pubished first:
L Draftset1
^ ^
| |
AB'C | A'B'C
/| |
/ | |
| | |
\ | |
\| / A'BC
|/
ABC
Then Draftset 1 would inherit the change to Graph B
The graphset for Draftset1 would become:
A -> S4
B -> S5
C -> S3
Then when we publish/merge/apply Draftset1 into Live, we don't lose the changes from Draftset 2:
D2 L D1
^
|
A'B'C' No changes lost
|\
| \
| |
AB'C | A'B'C Changes inherited from Draftset 2
/| |
/ | |
| | |
\ | |
\| / A'BC
|/
ABC
OPTION 2)
Conflict:
L Draftset1
^ ^
| |
AB'C | A'BC (warn!)
/| |
/ | |
| | |
\ | |
\| / A'BC
|/
ABC
Publish:
L Draftset1
^
|
AB'C Lose changes to B! (unless user manually fixes up their draftset)
|\
| \
| |
AB'C | A'BC (warn!)
/| |
/ | |
| | |
\ | |
\| / A'BC
|/
ABC
Conflict:
L Draftset1
^ ^
| |
| /| A'B'C (user chooses to copy changes to B from Live)
|/ |
AB'C | A'BC (warn!)
/| |
/ | |
| | |
\ | |
\| / A'BC
|/
ABC
Publish:
L Draftset1
^
|
A'B'C Merged! :)
|\
| \
| /| A'B'C (user chooses to copy changes to B from Live)
|/ |
AB'C | A'BC (warn!)
/| |
/ | |
| | |
\ | |
\| / A'BC
|/
ABC
Option 1 (especially without the enhancement) requires no changes to the API (or data returned), or the clients (PMD).
Overall, I think I prefer option 2, as its more predictable. But it's a change (we could maybe do this at a later date?)
TODO: figure out how we model this all in RDF. What history / audit trail do we want to keep? How do we garbage collect unused storage graphs.
Firstly the write up looks great. +1 for the ascii art branch diagrams! :-)
There is a small problem with this statement on merge semantics:
Then when we publish/merge/apply Draftset1 into Live, we don't lose the changes from Draftset 2:
"changes" also means being clear about the handling of DELETEs, and I don't think you've considered this. I think you meant the weaker statement "we don't lose the APPENDs from Draftset 2".
Specifically I think the MVP we've been describing has just been a merge strategy of "all theirs, all ours, or UNIONing the graph" on a graph by graph basis. I think for an MVP this is ok, so long as users understand that DELETEs will get stomped, as we have no record of them.
A more complete handling of conflict involves storing the sequence of APPEND/DELETE operations inside an RDF Patch/Delta, and letting users resolve the conflict by specifying the order of these operations at merge time. Once we know the order of changes we can offer various levels of merge granularity, providing much more precise mechanisms for merging.
My proposal would be to leave the RDF patch/log implementation till later, but it's a feature that I think would unlock a lot of future capabilities. Including improving our HA story.
Yeah i didn't consider deletes specifically but i think if we're doing it as graph (or whole endpoint) -granularity then it will just work
But yeah, agree we should keep it simple in v1 and prob just stick with simulating the current behaviour.
See here for the proposed data model as an example trig file and here for the supporting vocabulary
I had a thought about this recently. And this might be obvious to others but we could achieve most of the benefits here by making changes so that:
MOVE
-ing data from the draft graphs into the live graphs and therefore have the same merge behaviour of 'last change wins, per graph').This is the minimal change we could make which would let us do instant publishing.
Added extras (not strictly required for instant publishing)
In order to:
O(1)
publish.We should change drafter so that:
As part of designing/ implementing this ticket, we should also consider/design how role based permissions would interact with this, so we know we can add it later without too much rework.