Closed joshuakarp closed 2 years ago
Looking into this now.
From what I see, VaultInternal
should have at least the static method called createVaultInternal
and the destroy
method.
However because we have to differentiate destroying the in-memory instance on the on-disk state, then destroy
and stop
matters as well, this way when we stop the VaultManager
, what's most important is that we stop
on the VaultInternal
. The VaultManager
itself can have additional cleanup operations when this occurs.
So having both VaultManager
and VaultInternal
have CDSS just enables both sides to manage their side-effect/stateful responsibilities.
However there is one issue with the openVault
call. One cannot open a vault that isn't already created, because you cannot start the instance. Therefore one might ask what the purpose of openVault
might be. I suggest it would be similar to a sort of getter function, and but one that should throw an exception.
This also means the there is a public interface to vaults, and this is returned to the caller of VaultManager
.
But as we have evolved the design of the vault manager, we also want vaults to be used within a with context, so that transactions can take place during mutation of the vaults. This is more controlled way of manipulating vaults.
I've also moved in the new RWLock into #266. So that can be used for the vault map as well.
Currently the fresh
parameter is essential to VaultInternal
unlike it being an optional property in other domains. It is what determines whether the VaultInternal
is constructed with using git.init
which would try to initialise the git state, or whether it picks up existing state internally.
The fresh
was meant to say to start anew, and wipe out any existing state there.
Where to load existing state or not is meant to be automatic, which is to say that VaultInternal
upon starting should be checking the EFS's state for the vault directory and git directory, and then just loading relevant things into memory, and only if these did not exist, would it then do the git.init
.
The only issue is that during clone
operations, which creates state, this would mean that have to "pre-seed" the git state, prior to running VaultInternal.createVaultInternal
so that it is aware the state already exists. I think this is how it is currently working anyway with cloneVault
.
Moving to CDSS, would also mean that the deliberation of fresh
would go into the start
instead of the create
.
All of this means that isogit's functionality will be used in VaultManager
and VaultInternal
, whereas before I thought that git
would be isolated to VaultInternal
.
It is possible to move git.clone
to also VaultInternal
, by creating a second creation static method. The creation static methods are functions that outside the class instance, and this would be similar to how C++ an other languages have overloaded/multiple constructors. So cloning would just be a separate constructor and would be an asynchronous method too.
So we may have:
VaultInternal.createVaultInternal()
VaultInternal.cloneVaultInternal()
As 2 ways of constructing a vault internal, one is creating it from scratch, which equates it to git init
, and another is cloning from elsewhere which equates it to git clone
.
This also means EFS state is considered a dependency for VaultInternal
, so git.init
and git.clone
will be placed into createVaultInternal
and clonseVaultInternal
respectively, and therefore fresh
is applied there in particular. Which probably means that fresh
parameter won't exist in start
at all.
I have updated the spec here. any discussion of this aspect of the PR is done here.
After moving the metadata inside of VaultInternal
The only metadata the VaultManager would need is a mapping of VaultName
-> VaultId
. This is used by the methods
destroyVault
removes the mapping.listVaults
lists all of the vaultName
->vaultId
mappings.renameVault
Updates the mapping.getVaultId
returns the VaultId
from the mapping.getVaultMeta
this would actually touch the VaultInternal
s metadata But It's only used externally once to get the vault name. So I think it can be safely removed. Or moved into VaultInternal.So instead of accessing the VaultInternal
s metadata from the VaultManager
, The VaultManager
can just maintain a VaultName
->VaultId
mapping inside it's DB. we can store the VaultName
inside of the VaultInternal
s metadata but I don't think it's really needed.
Alternatively If we feel that the accessing the VaultInternals Metadata directly is needed. We can replicate how the js-encryptedfs works by using the sublevel prefixer method. I will need to explore this some more and test it.
Because each VaultInternal
has its own DB level. This means it's a prefix.
If you want to know whether a "level" is filled our not. You can use DB.count
.
This can return the number of fields in the level. You can use this to know whether a VaultInternal
exists or not.
However the VM also needs to be able to look up VaultId
by VaultName
. Ideally this should be solved by #188 and https://github.com/MatrixAI/js-db/issues/1. But for the mean time you would need a DB level that is VaultName -> VaultId
.
Make sure you're locking these operations though!
I have a small problem. We want to do all the state creation during start()
as per the spec. We need to create the Metadata inside the db at this point. So we need to know what the vaultName and the remote details are at this point.
I could add them as parameters to the start
method but that seems like a bad idea since we won't need to pass them unless we specify fresh and I don't think we want the ability to set remote details and name when we start fresh.
I can add them as properties to the VaultInternal
but then we need to update the values in two places. This only applies when we update the name and remote details.
I think option 2 works better here. but can you think of a better option @CMCDragonkai ?
Do we even need to clear the whole metadata when starting fresh? seems like we can just keep the vault name in the metadata while clearing everything else.
If --fresh
, it really means fresh. So you should clear the data.
You do need to recreate state in start
, the problem is that db level will not be setup properly if done in create*
static methods.
Then with start
you'd have all these parameters like vaultName
and such. These should have some meaning behind them, and the various setup
methods inside the VaultInternal
should do the rest similar to KeyManager
.
So we check for permissions when a node receives a pull or clone request. I think this means we should be checking the permissions in the vaultsGitInfoGet.ts
and vaultsGitPackGet.ts
services.
I can see one problem with how the permissions are checked though. To confirm if we have permission for a vault, we provide our NodeId
as part of the request message or metadata. but since NodeId
s are public information It seems possible that we can gain access to a vaults information so long as we can provide a NodeId
with the correct permissions.
Seems like to prove that we can access a vault we only have to provide something akin to a username when we should be using some secret or token instead. Either some proof that we have the access or proof that we are the NodeId
in question.
No the NodeId
isn't just supplied to the remote agent. It's checked via MTLS. So if the MTLS is verified, then the NodeId
can be trusted.
But remember you're not taking the NodeId
from the GRPC request message. You're taking it from the reverse proxy.
I think the original idea was that the reverse proxy adds the NodeId
to the request metadata that is then sent to the GRPCServerAgent
. A sort of Forwarded-For
header that can be trusted. https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/X-Forwarded-For and https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Forwarded.
However the reverse proxy doesn't "understand" GRPC frames atm, so I guess it's not able to do this.
Instead, what we had to do to pass the reverse proxy GRPCServerAgent
, and allow the handlers to easily acquire the currently authenticated connection information.
Right now our src/client/utils/utils.ts
has a sessionInterceptor
.
We can create one for src/agent/utils.ts
that does the same thing, but this time fetches information from the reverse proxy and then adds that metadata into the proxied request.
@tegefaulkes refer to this diagram https://excalidraw.com/#room=57543403694c95eba2e9,5qJFVQlpWc0vegUgEAQIpA which is regarding the session token usage and https://github.com/MatrixAI/js-polykey/wiki/network. It is implemented through the sessionInterceptor
. I think we need to add a new one called connectionInfoInterceptor
. It should be calling ReverseProxy.getConnectionInfoByEgress()
automatically and augmenting the request message leading metadata. The new metadata should use Forwarded-For
or related headers as it's basically the same idea as in HTTP1.1 reverse proxies.
I just remembered that these are "client interceptors", they are not interceptors applied on the server side: https://github.com/MatrixAI/js-polykey/pull/296#issuecomment-989740159.
GRPC upstream has discussion about adding server interceptors: https://github.com/grpc/grpc-node/issues/419.
This is why the authenticator
function exists. It plugs a function into the handlers for the handlers to authenticate easily.
Then the solution is to adapt the authenticator
in client/utils
to agent/utils
instead. Because what you want here is a function that will give you the connection information.
type ConnectionInfo = {
nodeId: NodeId;
certificates: Array<Certificate>;
egressHost: Host;
egressPort: Port;
ingressHost: Host;
ingressPort: Port;
};
@tegefaulkes the discussion about this authentication problem though should be in different/new issue.
I will create a new issue for this.
New issue created here #342
It seems unnecessary for the nodeConnectionManager
to contain the revProxy
. It's only used for the method NCM.holePunchReverse()
which just wraps revProxy.openConnection()
. and is only called in src/agent/service/nodesHolePunchMessageSend.ts
service. I feel the logic should be in-lined within that service and we can remove revProxy
from the NodeConnectionManager
altogether.
Do we want to update the locking in VaultInternal
to use a RWLock
as well?
The same RWLock instance can be shared between VaultManager and VaultInternal.
If you're going to have 2 locks, then the duties must be separated here:
VaultManager RWLock:
Vault RWLock (this is now dependent on the VaultManager RWLock read (these locks can only be held if the read lock is also held)):
This way, any GRPC service handlers or domain methods can still access the vault instance concurrently without blocking each other. And blocking only occurs when you are creating or destroying the vault.
This means if any context is holding a Vault RWLock
, they would necessarily be holding a VaultManager RWLock Read
. This would then block any creation or destruction routines of the vault in VaultManager.
This is mostly addressed now. Just blocked by #342
When checking the permissions for a vault. I think we don't want to reveal the existence of a vault, only that we don't have permissions for a vault that may exist or not. That is to say, when permissions are concerned we should only throw non-existing vault errors after checking and throwing any permission errors. So we should only ever see permission errors if the vault doesn't exist. This will hide the existence of vaults unless a node has permissions.
This is assuming that if a vault doesn't exist then no permissions exist for it in the ACL. So I'll need to double check that the ACL/vaultManager is cleaning said permissions when we destroy or refresh vaults.
Small problem I'll have to look into. It seems that there is no real locking when creating a new Vault. it checks if there is a vault that already exists with that name. However nothing will stop us from concurrently creating two vaults with the same name. That will be a problem. Adding a note to the spec.
I think the VaultManager
will need to hold a lock on itself for creating new vaults until it adds the name to the name->id maping in it's database. Likely the lock will be needed every time we write and read that mapping.
What's what the object map pattern is for, it's supposed to prevent the situation where 2 vaults get created with the same name... etc.
Problem is, the vaultId is only created during createVault
. It can't exist in the map before that. All other lifecycle stuff such as open, close and destroy respect the object map locking.
A solution to the concurrent vault creation for now is to have a RWLock
for using the vaultName
->VaultId
map in the database. This will mean that vault creation will be serialised. This will be a temp fix until we get db indexing working down the line MatrixAi/js-db#1. Note that all other lifecycle will use the objectMap locking. this is specifically to deal with the problem of updating and reading vaultNames at the same time.
I'll add this into the spec in a moment.
I'm getting this very odd bug. When creating 3 or more vaults at a time we get an ErrorEncryptedFSNotRunning
error from the EFS. This is happening during the vaultInternal start method when doing..
console.log('efs', this.efs[running])
this.efsVault = await this.efs.chroot(this.vaultDataDir);
console.log('efsvault', this.efsVault[running])
We end up with the log
INFO:VaultInternal:Creating VaultInternal - z4yVwcUzsNm1pPQpUZNHjhc
INFO:VaultInternal:Starting VaultInternal - z4yVwcUzsNm1pPQpUZNHjhc
INFO:VaultInternal:Creating VaultInternal - z6DSRigZenUGF7mfub5BvdP
INFO:VaultInternal:Starting VaultInternal - z6DSRigZenUGF7mfub5BvdP
INFO:VaultInternal:Creating VaultInternal - z5XnzXrbxvmJqzxTg3pFYQo
INFO:VaultInternal:Starting VaultInternal - z5XnzXrbxvmJqzxTg3pFYQo
INFO:VaultInternal:Creating VaultInternal - zKZABTgSG8qUrREnipB4ngK
INFO:VaultInternal:Starting VaultInternal - zKZABTgSG8qUrREnipB4ngK
INFO:VaultInternal:Creating VaultInternal - zLdfm3YMNmXSEXeSZzs7B82
INFO:VaultInternal:Starting VaultInternal - zLdfm3YMNmXSEXeSZzs7B82
console.log
efs true
at Object.start (src/vaults/VaultInternal.ts:274:13)
at runMicrotasks (<anonymous>)
INFO:DB:Stopping DB
ErrorEncryptedFSNotRunning
Why is this happening? The efs is running just before we call efs.chroot
so why is that failing with ErrorEncryptedFSNotRunning
? I'll have a look at the efs source.
The debugger might help in this case too.
Currently we have an error called ErrorVaultImmutable
which is thrown when trying to mutate a vault that was cloned. Basically it means we can't mutate the remote is defined. Conversely we need an error for the complement where we are trying to pull a vault but no remote is defined. To have some symmetry to these errors I'm renaming ErrorVaultImmutable
to ErrorVaultRemoteDefined
and adding ErrorVaultRemoteUndefined
.
This is mostly done for now. Some of the remaining tests require functioning vault mutating and committing.
Finished up the tests now.
Specification
CDSS
Currently,
VaultInternal
utilizes the create-destroy pattern. We're considering instead refactoring this into a create-destroy-start-stop pattern. This would allow us to align function usage betweenVaultManager
andVaultInternal
as follows:VaultManager
VaultInternal
createVault
createVaultInternal
openVault
start
closeVault
stop
destroyVault
destroy
The meat of the logic should now be in 2 static methods of
VaultInternal
, representing "smart asynchronous creators" ofVaultInternal
. These should be:creating a
VaultInternal
instance with new or existing state should useVaultInternal.createVaultInternal
. First time cloning of a vault should useVaultInternal.cloneVaultInternal
. Get all the isogit related side-effects (client-side logic) into these static methods, theVaultManager
would only manage any additional metadata.The
start
method now has to setup internal state, specifically Git state, it needs to initialize the repository and also perform the initial commit, however it has to be idempotent, so if these actions are already done, it won't bother with it.The stop cannot
stop
efs, because the EFS lifecycle is handled outside by VaultManagerThe
destroy
should then destroy the Vault internal state, however additional destruction management would have to occur inside the VaultManager which might mean it has to construct (and thus load into-memory) the instance before it can destroy it.The cloning and pulling code should work using
withF
andwithG
as used byNodeConnectionManager
. The cloning and pulling will need to have access to the ACL domain to ensure that permissions are properly set. Because permission checking occurs insideVaultManager
and notVaultInternal
and occurs beforeVaultInternal
functions are executed, this would imply thatVaultManager
has access to the ACL and is bound to the ACL. Which would mean the meat of logic discussed above should be inVaultManager
.Metadata
The
VaultInternal
now needs to maintain its own DB domain to store vault related metadata. 2 new properties are relevant:dirty
- this is a Boolean that determines whether the vault is in a dirty state or notremote
- if this contains a stringnodeId:vaultId
, then this vault should be considered a "remote" vault and must be immutable - see #309 and #253The internal domain is better than storing a POJO in the
VaultManager
. That logic should be removed in favor ofVaultInternal
having it's own internal domain, which is more efficient and more flexible.Additional metadata along with the
dirty
Boolean can be added to enable us to fix #252. This will be important to ensure internal consistency of our vaults in the presence of program crashes and also inadvertent OS shutdowns or just program killing.misc.
The implementations of the functions in
VaultManager
would internally call the respective function above inVaultInternal
. This would also allow us to add any extra functionality that needs to be separate from theVaultInternal
functions into theVaultManager
. For example, on creation of a vault, there's a few factors that need to be considered:VaultInternal
has no notion of the creation/destruction locks used in theObjectMap
structure, so this can be safely handled inVaultManager.createVault
/destroyVault
.Conversely, we should remove any instances where we perform similar logic between the two corresponding functions, and ensure it's centralized in a single place.
Locking will be handled via two
RWLock
locks. One lock for the object map inVaultManager
. The other lock for theVaultInternal
instance. TheVaultManager
RWLock
will handle the life cycle of theVaultInternal
Instance. Write locking will be used for the creation/destroy lifecycle. read locking will be used with theWithVaults
when accessing the vaults. TheVaultInternal
locking will be used for the respectivereadF/G
andwriteF/G
functions.Git repository: anything to do with git instantiation for the vault could also be centralised within
VaultInternal.createVaultInternal
To ensure the integrity of the vaultName information we need to apply locking when reading and writing to the
VaultName
->VaultId
mapping in the database. This is to ensure that when we check if a vaultName already exists in the mapping we can trust that this information is correct and avoid concurrency issues with vault creation and renaming.Additional context
VaultManager
and functions found insrc/git/utils.ts
.VaultInternal
(fromcommit
to end), as well as some discussion of CDSS withVaultInternal
https://vimeo.com/manage/videos/652700328A note on usage of the CDSS pattern:
create...
: performs the dependency side-effects (i.e. creates any encapsulated dependencies before creating itself). That is, we can see this function as outside the instance of the classstart
: performs the side-effects on the created instance (with the created instance coming from the constructor)tasks
VaultInternal
to using the CDSS async-init pattern.createVaultInternal
constructor for creating aVaultInternal
instance on new or existing state.cloneVaultInternal
constructor for first time cloning of a vault.VaultManager
shouldn't touch isogit at all.start
method handles the creation of any state. This should be idempotent. this also handles anydb
domain creation.destroy
should destroy allVaultInternal
state.NodeConnectionManager
.VaultInternal
dirty
field for tracking if the git repo is in a dirty state.remote
containing stringnodeId:vaultId
for tracking remote node and vault. this is also used to check if the vault is immutable.VaultManager
andVaultInternal
to useRWLock
locks.VaultInternal
lifecycle.VaultInternal
contents.VaultInternal
write/read locking for the read/write functions.VaultManager
git handling functions should be using both read locks.VaultInternal.pullVault
should respect the write lock.vaultName
->vaultId
metadata.VaultManager
object map life-cycle.VaultManager
object map locking.VaultInternal
life-cycle.VaultInternal
locking.VaultInternal.pullVault
locking