Swirrl / drafter

A clojure service and a client to it for exposing data management operations to PMD
Other
0 stars 0 forks source link

Implement Efficient Publishing in Drafter #272

Open ricroberts opened 6 years ago

ricroberts commented 6 years ago

In order to:

We should change drafter so that:

As part of designing/ implementing this ticket, we should also consider/design how role based permissions would interact with this, so we know we can add it later without too much rework.

ricroberts commented 6 years ago

Suggested initial implementation:

Imagine a live endpoint with 3 publicly named graphs, A, B, and C:

  live (L)
   |
  ABC

The graphset for this endpoint maps named graphs to storage graphs

  A -> S1 
  B -> S2
  C -> S3 

When we query the live endpoint with the public named graphs, it will rewrite the queries and results so we actually query the storage graphs.

Scenario 1

Now imagine a scenario where someone makes a change to Graph A (e.g. by appending some data)

   L  Draftset1
   ^  ^
   |  |
   |  /
   |/ 
  ABC    A'BC

As we do in the current version, we make a copy of A at the point of appending the new data into a new storage graph.

The graphset for this draftset

  A -> S4
  B -> S2
  C -> S3

To publish the change to this draftset to live, we just update the graphset used for Live to be the above. No copying of data is required

   L 
   ^  
   |
   A'BC
   |\  
   | \  
   |  |
   |  /
   |/ 
  ABC 

After the change, Live, has the changed verion of Graph A

Scenario 2

Now imagine that this scenario is made more complicated by someone else making a change to graph B in Draftset2 shortly after Draftset 1 was created:

Draftset2    L    Draftset1
          ^  ^  ^
          |  |  | 
    AB'C   \ |  |
            \|  / A'BC
             |/ 
            ABC  

The graphset for Live

  A -> S1
  B -> S2
  C -> S3

The graphset for Draftset1

  A -> S4
  B -> S2
  C -> S3

The graphset for Draftset2

  A -> S1
  B -> S5
  C -> S3

Depending on who merges/publishes first to Live, then Live will either miss the changes for Graph B, or the changes for Graph A.

There are 2 options for resolving conflicts:

OPTION 1) We could keep the client behaviour as it is now, by making graph sets inherit/cascade non-changed graph mappings from their parent endpoint's graphset (i.e. Live in our case).

The disadvantage of this is that (like now) changes can be silently inherited from live which might break your draftset and you not notice.

This will mean that if Drafset 2 was pubished first:

     L    Draftset1
     ^  ^
     |  |
  AB'C  |  A'B'C
    /|  |
   / |  |
  |  |  | 
   \ |  |
    \|  /  A'BC
     |/ 
    ABC  

Then Draftset 1 would inherit the change to Graph B

The graphset for Draftset1 would become:

  A -> S4
  B -> S5
  C -> S3

Then when we publish/merge/apply Draftset1 into Live, we don't lose the changes from Draftset 2:

  D2 L D1    
     ^  
     |
    A'B'C'        No changes lost
     |\
     | \
     |  |
  AB'C  |  A'B'C  Changes inherited from Draftset 2
    /|  |
   / |  |
  |  |  | 
   \ |  |
    \|  /  A'BC
     |/ 
    ABC     

OPTION 2)

With MVP:

Conflict:

     L    Draftset1
     ^  ^
     |  |
  AB'C  |  A'BC (warn!)
    /|  |
   / |  |
  |  |  | 
   \ |  |
    \|  /  A'BC
     |/ 
    ABC  

Publish:

     L    Draftset1
     ^  
     |
     AB'C     Lose changes to B! (unless user manually fixes up their draftset)
     |\
     | \
     |  |
  AB'C  |  A'BC (warn!)
    /|  |
   / |  |
  |  |  | 
   \ |  |
    \|  /  A'BC
     |/ 
    ABC  

With Enhancement

Conflict:

     L    Draftset1
     ^  ^
     |  |
     | /|  A'B'C (user chooses to copy changes to B from Live)
     |/ |
  AB'C  |  A'BC  (warn!)
    /|  |
   / |  |
  |  |  | 
   \ |  |
    \|  /  A'BC
     |/ 
    ABC  

Publish:

     L    Draftset1
     ^  
     |
    A'B'C  Merged! :)
     |\     
     | \
     | /|  A'B'C (user chooses to copy changes to B from Live)
     |/ |
  AB'C  |  A'BC (warn!)
    /|  |
   / |  |
  |  |  | 
   \ |  |
    \|  /  A'BC
     |/ 
    ABC  

Option 1 (especially without the enhancement) requires no changes to the API (or data returned), or the clients (PMD).

Overall, I think I prefer option 2, as its more predictable. But it's a change (we could maybe do this at a later date?)

TODO: figure out how we model this all in RDF. What history / audit trail do we want to keep? How do we garbage collect unused storage graphs.

RickMoynihan commented 6 years ago

Firstly the write up looks great. +1 for the ascii art branch diagrams! :-)

There is a small problem with this statement on merge semantics:

Then when we publish/merge/apply Draftset1 into Live, we don't lose the changes from Draftset 2:

"changes" also means being clear about the handling of DELETEs, and I don't think you've considered this. I think you meant the weaker statement "we don't lose the APPENDs from Draftset 2".

Specifically I think the MVP we've been describing has just been a merge strategy of "all theirs, all ours, or UNIONing the graph" on a graph by graph basis. I think for an MVP this is ok, so long as users understand that DELETEs will get stomped, as we have no record of them.

A more complete handling of conflict involves storing the sequence of APPEND/DELETE operations inside an RDF Patch/Delta, and letting users resolve the conflict by specifying the order of these operations at merge time. Once we know the order of changes we can offer various levels of merge granularity, providing much more precise mechanisms for merging.

My proposal would be to leave the RDF patch/log implementation till later, but it's a feature that I think would unlock a lot of future capabilities. Including improving our HA story.

ricroberts commented 6 years ago

Yeah i didn't consider deletes specifically but i think if we're doing it as graph (or whole endpoint) -granularity then it will just work

But yeah, agree we should keep it simple in v1 and prob just stick with simulating the current behaviour.

RickMoynihan commented 6 years ago

See here for the proposed data model as an example trig file and here for the supporting vocabulary

ricroberts commented 3 years ago

I had a thought about this recently. And this might be obvious to others but we could achieve most of the benefits here by making changes so that:

This is the minimal change we could make which would let us do instant publishing.

Added extras (not strictly required for instant publishing)