Open CMCDragonkai opened 3 years ago
I've been writing up some of my own thoughts on vault and file schemas in our MR for vaults refactoring https://gitlab.com/MatrixAI/Engineering/Polykey/js-polykey/-/merge_requests/205#note_689647018. I'll synthesise my thoughts, and bring them into here for discussion.
When considering vault schemas, I've been thinking about what the "intention" of a vault is. I've thought of a few different approaches to this:
We store secrets of a specific structure, and enforce that all secrets within this vault follow this specific structure (with the possibility for optional fields).
This would mean that the structure of the secret itself is dependent on the schema of the vault. That is, the individual components of some composite secret are structured at the vault level.
For example, suppose we had a vault schema represented like a JSON as follows:
{
"label": {
"/mediatype": "text/plain",
},
"url": {
"/mediatype": "text/plain",
},
"username": {
"/mediatype": "text/plain",
},
"password": {
"/mediatype": "text/plain",
},
"note": {
"/mediatype": "text/plain",
},
}
Then, with this relational database-like structure of the vault, our vault would appear as follows:
label | url | username | password | note |
---|---|---|---|---|
amazon | amazon.com.au | user1 | password1 | my amazon login |
twitter.com.au | user1 | password1 | my twitter login | |
... | ... | ... | ... | ... |
We store a specific set of secrets within our vault. I see this more like a directory of files, where we specify a list of secrets that must be found in the vault (could also limit some of these secrets as optional).
For example, suppose we wanted a vault that stored all the sensitive information required for onboarding an employee at Matrix AI. We could have a JSON schema as follows:
{
"toggl-username": {
"/mediatype": "text/plain",
},
"toggl-password": {
"/mediatype": "text/plain",
},
"zoho-email": {
"/mediatype": "text/plain",
},
"zoho-password": {
"/mediatype": "text/plain",
},
"aws-access-key": {
"/mediatype": "text/plain",
},
}
Then, our vault would appear as follows:
id | secret |
---|---|
toggl-username | amazon.com.au |
toggl-password | password1 |
zoho-email | someone@matrix.ai |
zoho-password | password1 |
aws-access-key | abcd1234 |
This third option shies away from the idea of enforcing the structure of the secret at the vault level. Instead, we create schemas that specify the structure of a secret.
For example, a schema for a login secret (same as the vault schema from option 1):
{
"label": {
"/mediatype": "text/plain",
},
"url": {
"/mediatype": "text/plain",
},
"username": {
"/mediatype": "text/plain",
},
"password": {
"/mediatype": "text/plain",
},
"note": {
"/mediatype": "text/plain",
},
}
Or a schema for a credit card secret (if possible, mediatype could potentially be restricted to numerical, etc):
"label": {
"/mediatype": "text/plain",
},
"cardholder-name": {
"/mediatype": "text/plain",
},
"card-number": {
"/mediatype": "text/plain",
},
"ccv": {
"/mediatype": "text/plain",
},
"expiry": {
"/mediatype": "text/plain",
},
}
Then, on a vault level, the user chooses which type of secret they'd like to add to the vault (e.g. login, credit card, etc). This could be an unrestricted add, whereby any kind of secret can be added to the vault.
There's also the potential to incorporate vault schemas here as well, where we specify the specific set of secrets that we expect to be stored in a vault. This would be the same way that we do it in option 2 - only this time, we have rigid schemas for the secrets to be added.
For example, we could then have a vault schema for Matrix AI onboarding:
{
"toggl": {
"/secretschema": "login",
},
"zoho": {
"/secretschema": "login",
},
"aws": {
"/secretschema": "aws-credentials",
},
}
And individual vaults can be created for each team member as deemed fit.
My perspective on these 3 options:
Vault schemas can be nested.
{
"dir1": {
...
}
"dir2": {
...
}
}
We have to differentiate directories from files. Which could be done with the /
since it is not allowed to be used in file names.
So this means a directory would also have its own vault schema applied to it? For example, we could have a vault schema which specifies some files and a directory, and this directory would specify another vault schema?
So I found some more discussion hidden away in a comment on one of the mock-ups: https://gitlab.com/MatrixAI/Engineering/Polykey/polykey-design/-/issues/40/designs/Vault_Schema.png?version=163940
Notably, the following example was given for a vault schema for storing a username and password inside a directory:
{
"dirA": {
"username": "text/plain",
"password": "text/plain"
}
}
This would create a vault with a directory structure like:
/dirA
/dirA/username
/dirA/password
But what if we want to have a vault that just has a username and password in the root directory (with no extra directory)? Then, the user needs to create a brand new schema for this:
{
"username": "text/plain",
"password": "text/plain",
}
There's an unnecessary duplication of data here. The "username" and "password" fields between the schemas don't have any relation to each other (they're just labels for a chunk of text). That is, there's no indication (besides the label) that they're both storing the same kind of secret. Similarly, the user now has 2 vault schemas to manage which are doing very similar things.
I feel that option 3 from above is an improvement over this approach, but I'm interested to discuss this.
Vault schemas are just directory schemas.
On 30 September 2021 9:45:06 am AEST, Josh @.***> wrote:
So this means a directory would also have its own vault schema applied to it? For example, we could have a vault schema which specifies some files and a directory, and this directory would specify another vault schema?
-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/MatrixAI/js-polykey/issues/222#issuecomment-930625901 -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
These would be 2 different schemas so they are independent.
On 30 September 2021 10:15:00 am AEST, Josh @.***> wrote:
So I found some more discussion hidden away in a comment on one of the mock-ups: https://gitlab.com/MatrixAI/Engineering/Polykey/polykey-design/-/issues/40/designs/Vault_Schema.png?version=163940
Notably, the following example was given for a vault schema for storing a username and password inside a directory:
{ "dirA": { "username": "text/plain", "password": "text/plain" } }
This would create a vault with a directory structure like:
/dirA /dirA/username /dirA/password
But what if we want to have a vault that just has a username and password in the root directory (with no extra directory)? Then, the user needs to create a brand new schema for this:
{ "username": "text/plain", "password": "text/plain", }
There's an unnecessary duplication of data here. The "username" and "password" fields between the schemas don't have any relation to each other (they're just labels for a chunk of text). That is, there's no indication (besides the label) that they're both storing the same kind of secret. Similarly, the user now has 2 vault schemas to manage which are doing very similar things.
I feel that this is the wrong approach to take, but I'm interested to discuss this.
-- You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub: https://github.com/MatrixAI/js-polykey/issues/222#issuecomment-930639085 -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Had a quick discussion with Roger about this. Some clarifications:
We need to remember that a "vault" can essentially be seen as a directory on the filesystem, where we store secrets (files), and embed version control inside it. As such, a directory inside a vault can analogously be seen as a nested vault.
Therefore, a vault schema is just a description for a directory. The vault schema should be minimal and flexible to reflect this filesystem structure.
For example, our username and password vault schema:
{
"username": "text/plain",
"password": "text/plain",
}
The vault is then expected to contain exactly these 2 text files.
Note, we can eventually utilise native features from JSON schemas to increase the expressive power of our schemas without requiring a lot of work (such as the strict
flag for loosening the schema: providing optional fields, or for specifying that a schema can have additional elements).
Eventually, from the GUI's perspective, these schemas would be used to generate a form to create a vault. This would also mean we could use the properties from the JSON schema to enforce the validation logic at this level.
Similarly, note that a vault doesn't necessarily need to have a schema applied to it. For example, we could have an "unrestricted" vault (with no schema applied) that contains a collection of directories, with each of these directories having a different kind of schema applied to it.
Additionally, for a cloned vault, we'd need to consider whether we also clone the schema. The answer here is most likely yes.
Finally, schemas should be identified with a name and/or ID.
While vault schemas can be user-defined, we should also have some native schemas for users (for example, login, credit card, etc).
secrets add
command. This means to add a credit card number, we'd need to make 4 separate calls to the CLI. secrets add credit-card <cardholder name> <card number> <expiry date> <ccv>
.
Specification
TBD
Additional context
76
4
Tasks
applySchema
)TBD