MatrixAI / Polykey-Docs

Documentation for Polykey
https://polykey.com/docs/
GNU General Public License v3.0
1 stars 5 forks source link

Create diagrams for new processes implemented in vaults refactoring MR #3

Open joshuakarp opened 2 years ago

joshuakarp commented 2 years ago

There are some residual diagramming requirements from the vaults refactoring MR.

All of these diagrams will also need to be integrated into some reference documentation for the entire vaults domain, based on the refactoring efforts.

Tasks

  1. [ ] 1. iso-git exception handling for the cloneVault/pullVault functionality https://gitlab.com/MatrixAI/Engineering/Polykey/js-polykey/-/merge_requests/205#note_681727197 - may be a good idea to be done concurrently with the vault sharing testing
  2. [x] 2. generalising the NodeConnection creation https://gitlab.com/MatrixAI/Engineering/Polykey/js-polykey/-/merge_requests/205#note_704400712 and the VaultMap (potentially may need something separate for the indexing operations for vault creation https://gitlab.com/MatrixAI/Engineering/Polykey/js-polykey/-/merge_requests/205#note_700844380 - see MatrixAI/Polykey#257)
  3. [x] 3. vault lifecycle (state diagram - concurrently showing both in-memory state and EFS state) https://gitlab.com/MatrixAI/Engineering/Polykey/js-polykey/-/merge_requests/205#note_708692900
  4. [x] 4. vault version allowable transitions https://gitlab.com/MatrixAI/Engineering/Polykey/js-polykey/-/merge_requests/205#note_692001869 current status here and further information in this thread https://gitlab.com/MatrixAI/Engineering/Polykey/js-polykey/-/merge_requests/205#note_697093105
  5. [ ] 5. correct order of shutdown for resources (less specific to vaults refactoring - came up on the MR though: https://gitlab.com/MatrixAI/Engineering/Polykey/js-polykey/-/merge_requests/205#note_712570155) "some kind of diagram to show lifetime, or like a babushka doll"
CMCDragonkai commented 2 years ago

@joshuakarp can you create the new pages/or attach to existing pages in the wiki. First an info-dump, then clean up the wiki structure in MatrixAI/Polykey-Docs#4 and https://github.com/MatrixAI/Polykey/issues/5

CMCDragonkai commented 2 years ago

These diagrams will need to inform the standards by which future diagrams are drawn as well.

joshuakarp commented 2 years ago

For 2: started to generalise the NodeConnection creation diagram to a "Locked Object Creation" diagram (think this title suits this process well) as per https://gitlab.com/MatrixAI/Engineering/Polykey/js-polykey/-/merge_requests/205#note_708692900.

This should be prefaced by providing a generic structure for the ObjectMap. For example:

type ObjectMap = Map<ObjectId,
  {
    resource?: Object;
    lock: MutexInterface;
  }
>;

And then the following diagram describes the process for constructing/retrieving an object in one of these locking maps:

image

http://www.plantuml.com/plantuml/uml/VL11JiCm4Bpd5NDC3oXtAa4Hue04GkeFGZ9H3AuTrckL_XxNYL6WY9DtPsTclRCBseh6WwroKGadjbe1Pa3zu5HEC0ulhs_izBcTC7XPkiV-TWCTwL2V63OLC8ls33vAH_3J10rdESy-bspWwgPfX1h5GHPPqsoNOL0_vP8s4BNpHNKSZKt0a-_UOGA4bcrW-axgrZowFbFBoXaoG_NRylfUs2fXa-Fs1yAB1FBuQ7HSi--wZsZaBuDoLY7s_JS4zRD_7grtBEJzV5Xn_IUlabOvS9UUUB1V

joshuakarp commented 2 years ago

For 3: a simple start to visualising the in-memory and EFS state of a vault, according to the states a vault can be in:

image

http://www.plantuml.com/plantuml/uml/VP0n2uCm48Nt_8h3tJ_WK8JIGeVgKEWkvg22IIGv1v7-z-RYMIkbdNAyx-Mzuyf0ZQVHzEhHQGGq0qsWCRI-6wXpLgbe88IiSd1lnfuoQ09KpkARr0DQr1zKX5a1YELYSuF6-IdnQneNr-QfBPpEDpRmA_IfOAqvzYxiQaIOVqKmbmZU5_ByiAvMGEinVabXIaJcoLZm0SYxwGjBPVpEmu95swMxE6pqJtQ9LiVZlm00

CMCDragonkai commented 2 years ago

Best test to see where the docs make sense is to try to explain it to others and record questions. Schedule time to do this quickly in Monday sprint planning.

On 29 October 2021 10:43:07 am AEDT, Josh @.***> wrote:

For 3: a simple start to visualising the in-memory and EFS state of a vault, according to the states a vault can be in:

image

http://www.plantuml.com/plantuml/uml/VP0n2uCm48Nt_8h3tJ_WK8JIGeVgKEWkvg22IIGv1v7-z-RYMIkbdNAyx-Mzuyf0ZQVHzEhHQGGq0qsWCRI-6wXpLgbe88IiSd1lnfuoQ09KpkARr0DQr1zKX5a1YELYSuF6-IdnQneNr-QfBPpEDpRmA_IfOAqvzYxiQaIOVqKmbmZU5_ByiAvMGEinVabXIaJcoLZm0SYxwGjBPVpEmu95swMxE6pqJtQ9LiVZlm00

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/MatrixAI/js-polykey/issues/258#issuecomment-954301836 -- Sent from my Android device with K-9 Mail. Please excuse my brevity.

CMCDragonkai commented 2 years ago

@joshuakarp have you pushed the diagrams into the relevant pages on the wiki?

If so, please link them up to this issue.

joshuakarp commented 2 years ago

My bad, should have linked them here.

  1. cloneVault/pullVault diagram still needs to be created. Will need to chat with @scottmmorris to get this done.
  2. Generalised locking mechanism for object creation has been added for Vault Lifecycle and Node Connections
  3. Vault lifecycle (state in EFS and in-memory) has been added to Vault Lifecycle
  4. Vault versioning state still needs to be worked out, but will end up going to Vault Versioning
  5. Our "babushka doll" of dependencies still needs to be figured out too. This is potentially going to be quite a big diagram. I'm not even sure that a big sequence diagram is going to be the best for this, but I'll try and mock one up this morning.

Note that these diagrams are still mostly drafts. Will get some feedback in our sprint meeting today on how they could be improved.

joshuakarp commented 2 years ago

Thinking about how best to structure 5.

For dependency injection specifically, we have the following resource: https://en.wikipedia.org/wiki/Dependency_injection#Structure

The sequence diagram on the right very clearly shows the order in which things are created, but is a little tricky to show how one resource is injected into another.

joshuakarp commented 2 years ago

Following on from the above diagram, I've started to do something similar:

http://www.plantuml.com/plantuml/uml/RP513e8m44NtFKMNckW5N1X08Wir8Ng22Vm04OiPAjZR2p5feR3zxxy_YmbQJQm_hrfGfgkED6JQrEO94nPGMeYCCOPn9AQvt1-7I1waGcyxvuPxuZpbvclyzWnwtXqTkCMwv-32ky3SI541NbWEUSwZAhAIAkKR5lpxcpZQJKQsrWzFLryTCskvPe9MKwqJfdVfApFeBevBmq2EzLy4KFdmgHy0

I don't really like how the dependency injections aren't inherently obvious though (i.e. what we need to inject on new) - they're just textual on the arrow, and don't use the objects on the diagram.

joshuakarp commented 2 years ago

From discussions in sprint meeting today, we can use a combination of two diagrams for 2 separate purposes:

  1. A sequence diagram to demonstrate the lifetimes of objects in Polykey, and the order in which things need to be constructed/destructed.
  2. A component diagram to demonstrate both the and internally managed dependencies and external references (the "babushka" doll diagram).
CMCDragonkai commented 2 years ago

The component diagram can show encapsulated (optional) deps by nesting the boxes, while arrows between boxes can be used for required dependencies (external) and therefore not managed by lifecycle functions.

joshuakarp commented 2 years ago

For 3.

Based on feedback from sprint meeting, have changed the vault lifecycle diagram to the following:

image

http://www.plantuml.com/plantuml/uml/ZO_1IiGm48RlUOfXB-gXFi0UP44KFBWU1E-X-PN0x4GoKt7VthH3rRfOlNJedo_p_TcfnMh3WODbOz1J7DY8ypFwOz_-sx51GvWcRVR5YGr5fNqHXF53M-eylnD3bSYKHIrAZoqbnFH5tTm--ivsKA1oPeJthFPfU7Y587spU11yh9euls7cbYvtQA3PSir55nOFWe-_t-FSRnP_RjTTqTp6jzr7YI-ebtr5uwVe_28uC-7ZlPzmWbejnrFQEvyk7zEPTcIbIcdf4lvlaHqa3GV-0000

joshuakarp commented 2 years ago
type ResourceMap = Map<ResourceId, {
    object?: Object;
    lock: MutexInterface;
}>;

Second attempt at 2. with this new structure in mind:

image

http://www.plantuml.com/plantuml/uml/ZP71JiCm44Jl-OevEX8qTwD2U-I0aFe78TvGW-C4UoqEY7z7k5lLKTJ0RUpPMU_pxYAtWTFWgYmgmFPGWUAOGaVOuY3ogPqshyJgV7uqTGq-aY-g3VNMawahODvdCxwyKSSFYqJoimnmUzyqfsA8qpHtucaSY5FmE1MShoEFKvRa8a59Uj7vysWTGUsPQPWOFABjFf8D13TsxpLiXFfL-NY9aJSvAQPan5wdB3aaiSPHaAooI0_njdZEOjC5QfbKVMcdPykBnEfBKZV8CDzbyO4SjP6oKrx_FQetjgyA9SDVMxBERs-H1oxVqyjnPsFglIoiYAp_WlxF1opXhu0Bj63ko9iqk1y0

I'm a little wary whether the distinction between resource and object is explicitly clear here. Perhaps it might be better to instead do:

entry = ObjectMap.get(ObjectId)

And then do entry.object? defined, undefined, etc.

CMCDragonkai commented 2 years ago

Suggest:

type Resource = { object?: Object; lock: MutexInterface };
type Resources = Map<ResourceId, Resource>;

That way we can refer to the Resource.object and Resource.lock.

Could wrap this up into a little library later...

joshuakarp commented 2 years ago

Easy, that makes sense to me.

scottmmorris commented 2 years ago

Untitled

This is an updated ROUGH diagram of the vault cloning process. @joshuakarp will probably take bits and pieces from this to create his own.

I think the pulling process should also have at least a partial diagram. Although the GRPC streaming process is exactly the same there are two key differences overall:

scottmmorris commented 2 years ago

Pulling

Here is a very similar one for pulling. The Part in the GRPC connection box is exactly the same, the only difference is the the before and after. Note that I haven't included any of the necessary setup for pulling a vault (needs to be cloned from a source first etc.)

Also I'm not too sure what error you get when you try to pull from a vault that doesn't match your history which would probably be useful to include in this diagram or a similar one.

CMCDragonkai commented 2 years ago

@scottmmorris are you able to turn the above into plantuml so it can more easily edited/maintained given that parts get changed over time.

joshuakarp commented 2 years ago

@scottmmorris and I chatted, and he was going to do these 2 rough diagrams, and I'd convert them to polished plantuml versions. This way, I don't have to try to deduce the pull/clone process from the source code, and it gives me a starting point to work from.

joshuakarp commented 2 years ago

One quick question, is this the kind of granular detail you expect from the polished sequence diagrams @CMCDragonkai? e.g. looking at the current boxes of numbered steps that @scottmmorris has done, should these be converted to internal steps in the sequence diagram? Or should it be more general?

CMCDragonkai commented 2 years ago

I'd say those diagrams could be broken down into different situations. It would make the diagrams smaller. Factor out the error cases into their own diagrams. But I haven't had a deep review, so you'll just have to make a judgement call.

joshuakarp commented 2 years ago

No problem, I'll figure it out.

scottmmorris commented 2 years ago

Here are two of the plantUML diagrams the first one showing the happy path for cloning and the second is an example of an error case (permissions). Personally, with the first one I think the detail of what git does on Agent B should be removed. Its more git protocol that can be found by looking into the git image

image

Here is an example of the more simple happy path without too much extra low level detail

image

BTW I have the text versions of these diagrams plus the original more complex diagram above in plant UML so can directly make edits/share if needed

joshuakarp commented 2 years ago

Some notes from me regarding the diagrams:

Complex happy diagram

scottmmorris commented 2 years ago

Yep agree with a lot of those points.

About the HEAD: yes, it will always clone head unless we change the source code to specify different behaviour so I think its better to remove the wanted object id being sent and instead have a sentence or two describing this behaviour in the wiki. In terms of why it is a POST request I'm not too sure why it is done like that but this pattern is done internally by isomorphic git. i.e. The iso-git library will make the 'GET' and 'POST' calls to our supplied request object when necessary and we just handle what happens after. But it is important for these to be distinguished from each other because iso-git first needs a list of all the commits that are available and then I assume internally processes that against the commits in the current git directory to then make a POST request.

That's maybe why there is a little confusion on whether I should include the Get all V's commits from git refs. Because technically we have written that code but then you could make the case for including all the information about reading from the local git directory and comparing the commit history (which is all handled internally by isomorphic git). IMO we should exclude all that extra info and just have the returned list: v's commit OIDs.

Here is another iteration of the happy case diagram (just used very basic sample names which can be swapped out): image

joshuakarp commented 2 years ago

That's great, this is much clearer to me now. (One little typo on the send notification step: should be to abc).

In the wiki, this diagram would also benefit from some succinct textual description about some of the terminology used (e.g. "commit OIDs", "commit objects". Just a quick description of what these refer to with iso-git.

Any other quick feedback/thoughts @CMCDragonkai?

If nothing else, then an almost identical one should be made for the pulling process (with the minor differences added). And some separate exceptional ones too.

CMCDragonkai commented 2 years ago

What's the def in Start GRPC to def?

joshuakarp commented 2 years ago

What's the def in Start GRPC to def?

Node ID

CMCDragonkai commented 2 years ago

Should make it clear that isogit's HTTP requests are all occurring within 1 single GRPC stream. At least that's what it looks like there. I think plantuml has a lifeline thing, that could be used as well.

image

For def you should really NodeId: def.

scottmmorris commented 2 years ago

How is this? Should the response arrow first go to the GRPC stream and then in a separate arrow go back to Polykey Agent A? At the moment I have it as a single arrow going through the GRPC stream and to the Polykey Agent A

image

CMCDragonkai commented 2 years ago

Better, but can we bold VaultId: ... and NodeId: ... to indicate that they are not the same as the other text.

Or is Stream { ... } a representation of the message type?

joshuakarp commented 2 years ago

Perhaps do something like this?

image

where we have a dashed line for the response (after the GET and POST request from B to stream), and then the normal solid line from stream to A?

CMCDragonkai commented 2 years ago

Note that the lifeline can be on Keynode A and Keynode B, as the stream's lifetime is shared between the 2. No need to create a separate line in the middle.

joshuakarp commented 2 years ago

My only issue with having the lifeline of the stream specifically on the lines of Keynode A and Keynode B is that it suggests that it's the lifeline of A and B, as opposed to the stream.

CMCDragonkai commented 2 years ago

The lifelines are whatever you actually annotate them to be. See https://sparxsystems.com/resources/tutorials/uml2/sequence-diagram.html how they use lifelines depending on the context.

CMCDragonkai commented 2 years ago

Also: image

joshuakarp commented 2 years ago

Yeah that's true, this seems fine to me too

scottmmorris commented 2 years ago

Thoughts?

image

CMCDragonkai commented 2 years ago

Yea that's alot clearer. Is there a specific type used for commit objects and commit OIDs?

Do you want to do say Response stream for both of them instead of Response list.

I would want to label that theGET and POST are part of isogit. Or that the GRPC stream is for HTTP. Something about the fact that isogit is sending and receiving HTTP on the stream.

Another idea:

GET Request
GET Response
GET Response

So you know it's part of the same GET transaction. Also is there actually 2 responses? Usually HTTP has 1 response, but it may encode multiple things in a single JSON document.

Btw, You use KeynodeA and KeynodeB but if the node id is abc and def, you can just say NodeId: 'abc' and NodeId: 'def. No need to say KeynodeA or KeynodeB.

And also just PolykeyAgent is fine, no need for suffix of A and B.

scottmmorris commented 2 years ago

There is no specific type for the commit objects and OIDs, they are just buffers.

Yes there is actually two responses, the metadata containing the vault name and id and then the stream of OIDs. The stream of OIDs is sent straight to iso-git whereas we use the metadata. I didn't know if there was a way to combine them into one response because one is a stream and the other isnt, just a unary response.

I can't have two PolykeyAgents without differentiating them some way. Maybe instead I could remove the group and PolykeyAgent and just have the node as the participant? so it would have to be nodeabc because plantuml doesn't let you have spaces or special characters in the participant name which I think is a bit odd.

I added in a self call which explains that isomophic git is using http, it directly points to the lifecycle of the GET/POST requests so that might be the best way of doing it.

image

joshuakarp commented 2 years ago

I can't have two PolykeyAgents without differentiating them some way. Maybe instead I could remove the group and PolykeyAgent and just have the node as the participant? so it would have to be nodeabc because plantuml doesn't let you have spaces or special characters in the participant name which I think is a bit odd.

You can do something like this in plantuml, so that the participants share the same name but use a different internal label when defining transitions:

participant PolykeyAgent as PolykeyAgentA
participant PolykeyAgent as PolykeyAgentB
scottmmorris commented 2 years ago

Nice, edited the diagram above to include that so I don't clutter this thread with too many images.

joshuakarp commented 2 years ago

async-init changes have had an impact on the order of creation diagram. i.e. concurrent order vs a dependency order

Is there a way to do this automatically with some kind of software? (e.g. like the boot dependency diagram of ordering, systemd)

Would do this manually with execution of the PolykeyAgent, and order of construction.

For now, we can use the source code to do this manually (no need to do an actual diagram), just make a nested list of this for now. Use indentation for ordering of creation.

CMCDragonkai commented 2 years ago

The GET Response Metadata should just be GRPC Leading Metadata. It's not part of the GET request response transaction.

scottmmorris commented 2 years ago

image image

These are the other two paths to make diagrams of. I'm not sure what other paths would be useful to include, things like cloning a vault that is undefined or trying to pull with merge conflicts would show similar flows to the InvalidPermisssions diagram.

CMCDragonkai commented 2 years ago

If it's easy to do, then just copy and paste for the other flows but make the change.

joshuakarp commented 2 years ago

@scottmmorris could you also share the raw plantuml markup to make the above diagrams?

scottmmorris commented 2 years ago

Vault Pulling:

@startuml

title Vault Pulling

box NodeId: 'abc' #Lavender
participant PolykeyAgent as PolykeyAgentA
end box
box NodeId: 'def' #Beige
participant PolykeyAgent as PolykeyAgentB
end box

PolykeyAgentB<-]: Create vault:\n    **vaultName: 'v1'**\n    **vaultId: 'a1b2'**
PolykeyAgentB<-]: Share **vaultName: 'v1'** to **nodeId: 'abc'**
PolykeyAgentB->PolykeyAgentA: Send VaultShare notification to **nodeId: 'abc'**
[->PolykeyAgentA: Clone **vaultName: 'v1'** from **nodeId: 'def'**
PolykeyAgentB<-]: Add **secret 's1'** to **vaultName: 'v1'**
[->PolykeyAgentA: Pull **vaultName: 'v1'**
PolykeyAgentA->PolykeyAgentA: Retrieve **nodeId: 'def'** and **vaultName: 'v1'** from database
PolykeyAgentA->PolykeyAgentB: Start GRPC Stream to **nodeId: 'def'**
activate PolykeyAgentA
activate PolykeyAgentB
PolykeyAgentA->PolykeyAgentA: Isomorphic-git via HTTP 
activate PolykeyAgentA
PolykeyAgentA->PolykeyAgentB: GET Request {\n    vaultNameOrId: 'v1' OR 'a1b2'\n    nodeId: 'abc'\n    action: 'pull'\n}
PolykeyAgentB->PolykeyAgentA: Leading Metadata {\n    vaultName: 'v1'\n    vaultId: 'a1b2'\n}
PolykeyAgentB->PolykeyAgentA: GET Response stream [\n    **v1's** commit OIDs\n]
deactivate PolykeyAgentA
PolykeyAgentA->PolykeyAgentA: Isomorphic-git via HTTP 
activate PolykeyAgentA
PolykeyAgentA->PolykeyAgentB: POST Request Metadata {\n    vaultNameOrId: 'v1'\n}
PolykeyAgentB->PolykeyAgentA: POST Response stream [\n    **v1's** Commit Objects\n]
deactivate PolykeyAgentA
PolykeyAgentA<-PolykeyAgentB: Finish GRPC Stream to **nodeId: def**
deactivate PolykeyAgentA
deactivate PolykeyAgentB
PolykeyAgentA<-PolykeyAgentA: Reload the working directory commit\nLoad vault from existing state and store in vault map\nWrite remote **vaultId: 'a1b2'** & **nodeId: 'def'**

@enduml
scottmmorris commented 2 years ago

Vault Cloning

@startuml

title Vault Cloning

box NodeId: 'abc' #Lavender
participant PolykeyAgentA
end box
box NodeId: 'def' #Beige
participant PolykeyAgentB
end box

PolykeyAgentB<-]: Create vault:\n    **vaultName: 'v1'**\n    **vaultId: 'a1b2'**
PolykeyAgentB<-]: Share **vaultName: 'v1'** to **nodeId: 'abc'**
PolykeyAgentB->PolykeyAgentA: Send VaultShare notification to **nodeId: 'abc'**
[->PolykeyAgentA: Clone **vaultName: 'v1'** from **nodeId: 'def'**
PolykeyAgentA->PolykeyAgentB: Start GRPC Stream to **nodeId: 'def'**
activate PolykeyAgentA
activate PolykeyAgentB
PolykeyAgentA->PolykeyAgentA: Isomorphic-git via HTTP 
activate PolykeyAgentA
PolykeyAgentA->PolykeyAgentB: GET Request {\n    vaultNameOrId: 'v1' OR 'a1b2'\n    nodeId: 'abc'\n    action: 'clone'\n}
PolykeyAgentB->PolykeyAgentA: Leading Metadata {\n    vaultName: 'v1'\n    vaultId: 'a1b2'\n}
PolykeyAgentB->PolykeyAgentA: GET Response stream [\n    **v1's** commit OIDs\n]
deactivate PolykeyAgentA
PolykeyAgentA->PolykeyAgentA: Isomorphic-git via HTTP 
activate PolykeyAgentA
PolykeyAgentA->PolykeyAgentB: POST Request Metadata {\n    vaultNameOrId: 'v1'\n}
PolykeyAgentB->PolykeyAgentA: POST Response stream [\n    **v1's** Commit Objects\n]
deactivate PolykeyAgentA
PolykeyAgentA<-PolykeyAgentB: Finish GRPC Stream to **nodeId: def**
deactivate PolykeyAgentA
deactivate PolykeyAgentB
PolykeyAgentA<-PolykeyAgentA: Write remote **vaultId: 'a1b2'** & **nodeId: 'def'**\nLoad vault into vault map

@enduml
scottmmorris commented 2 years ago

Vault Cloning with invalid permissions

@startuml

title Vault Cloning Invalid Permissions

box NodeId: 'abc' #Lavender
participant PolykeyAgent as PolykeyAgentA
end box
box NodeId: 'def' #Beige
participant PolykeyAgent as PolykeyAgentB
end box

PolykeyAgentB<-]: Create vault:\n    **vaultName: 'v1'**\n    **vaultId: 'a1b2'**
[->PolykeyAgentA: Clone **vaultName: 'v1'** from **nodeId: 'def'**
PolykeyAgentA->PolykeyAgentB: Start GRPC Stream to **nodeId: 'def'**
activate PolykeyAgentA
activate PolykeyAgentB
PolykeyAgentA->PolykeyAgentA: Isomorphic-git via HTTP 
activate PolykeyAgentA
PolykeyAgentA->PolykeyAgentB: GET Request {\n    vaultNameOrId: 'v1' OR 'a1b2'\n    nodeId: 'abc'\n    action: 'clone'\n}
PolykeyAgentA<-PolykeyAgentB:ErrorVaultPermissionDenied
deactivate PolykeyAgentA
PolykeyAgentA->PolykeyAgentB: End GRPC Stream to **nodeId: 'def'**
deactivate PolykeyAgentA
deactivate PolykeyAgentB
[<-PolykeyAgentA: ErrorVaultPermissionDenied

@enduml