Shopify / ejson

EJSON is a small library to manage encrypted secrets using asymmetric encryption.
MIT License
1.33k stars 63 forks source link

Introduce schema version 2, including a secure per-secret identifier #84

Closed MrMarvin closed 3 months ago

MrMarvin commented 3 years ago

Rationale

With an ever growing collection of ejson documents across multiple repositories, managed by various teams, the burden to manage recurring (i.e. same secret in more than one file) seems to outweigh the benefits of having individual and completely unrelated cipher texts per encryption operation.

Example

Given an existing ejson document

[19:42:00] 🛒😺 :ejson$ cat /tmp/test.ejson
{
  "_public_key": "3f809a054114cb2565b465628c012999a46db9500ef11c53c677d72f87fc423a",
  "database_password": "very secret do not share",
  "otherbase_password": "very secret do not share"
}
[19:42:11] 🛒😺 :ejson$ ejson encrypt /tmp/test.ejson
Wrote 423 bytes to /tmp/test.ejson.
[19:42:18] 🛒😺 :ejson$ cat /tmp/test.ejson
{
  "_public_key": "3f809a054114cb2565b465628c012999a46db9500ef11c53c677d72f87fc423a",
  "database_password": "EJ[1:tyW3qutcFC8r3EROjkdzfKGM9Z4lFsoO7mqOmUmVBQ4=:8T+lgFLPLy/r/foSEIUjRdLFwIQlpmgU:KhHVzQAr0DnGdLBkGNlYTXr4Q5lWwIbPp8bmnDSLLN+ehAEUih7nfA==]",
  "otherbase_password": "EJ[1:tyW3qutcFC8r3EROjkdzfKGM9Z4lFsoO7mqOmUmVBQ4=:blyGcE6OYv0hyFdRP8rY+QL3ibg1NFY0:bIgQWhFVyA3mZfAYuVemCHPzXdqrzrzIim5rP0HWEp3wgEc/NtvlaA==]"
}

if we would be asked to rotate the database_password, because there is the suspicion of it having leaked or due to other policies or requirements, there is currently no way (for anyone without access to the private key) to notice that the very same secret is also used for otherbase_password. This can get especially tiresome if those two secrets are not within the same file, but in different files across a system landscape.

With the proposed new field added to the cipher text wire format, it becomes trivial to check for duplicate secret usage.

[19:49:58] 🛒😺 :ejson$ cat /tmp/test.ejson
{
  "_public_key": "3f809a054114cb2565b465628c012999a46db9500ef11c53c677d72f87fc423a",
  "database_password": "very secret do not share",
  "otherbase_password": "very secret do not share"
}
[14:14:36] 🛒😺 :ejson$ ./build/bin/darwin-amd64 encrypt /tmp/test.ejson
Wrote 553 bytes to /tmp/test.ejson.
[14:14:38] 🛒😺 :ejson$ cat /tmp/test.ejson
{
  "_public_key": "3f809a054114cb2565b465628c012999a46db9500ef11c53c677d72f87fc423a",
  "database_password": "EJ[2:jDxAEbb/hETVyTEd8UUQiEu1vyITDV0aZWhkpGcn+F4=:SYT5e65wzsGnOohswWAxMoXRzt2msszS:a5jP3gJhxEoFul2Fq0lqHYa89zk4E9zlHMAxL8fKC6YbP3ZAvqa8vQ==:2e6e6c65a58c79e86eb45f6b89ca31e2e52674faa451d73d1aa7dc20210cf616]",
  "otherbase_password": "EJ[2:jDxAEbb/hETVyTEd8UUQiEu1vyITDV0aZWhkpGcn+F4=:OYle5PUztAEas4xtwH705dgSqL3w1hnx:mK6YfjPeQ8FW9y+JsDsMYLS9SOjiKTrIBxYy0EX3uleteTe4YoUHKA==:2e6e6c65a58c79e86eb45f6b89ca31e2e52674faa451d73d1aa7dc20210cf616]"
}

Instead of the context dependent and vague "please rotate the database_password!", this allows very clear instructions, such as

replace every secret identified by 2e6e6c65a58c79e86eb45f6b89ca31e2e52674faa451d73d1aa7dc20210cf616 with the new database password.

This opens possibilities for a centralised entity to grep for a known bad identifier across a collection of ejson documents and mass-open change requests to affected users, without having access to the respective private keys.

Notes and assumptions about security implications of this change

The new identity field uses one of the strongest, widely adapted cryptographic hash functions - scrypt. Reasons this was chosen:

Still, it introduces an additional representation of the plain text value, and gives attackers the freedom to chose to either attack the hash sum or the existing NaCl box.

I was unable to find a reasonable comparison of overall cryptographic strength between the two operations - mostly because they serve to very different applications. To my understanding both can be considered state of the art today.

Note

Documentation for scrypt implementations urge the user to chose "a good random salt". For our use case, we cannot add a per-secret salt to the hash, as this would defeat the purpose of having it uniquely identifiable across multiple files.

Implications for existing usages of ejson

The change of the wire format is introduced in a backwards compatible way. Ejson will be able to decrypt both schema version 1 and schema version 2 encrypted documents. It will output schema version 2 documents, including the new field per default.

I'd like to discuss if this an acceptable way forward, or if we should introduce an additional command line argument that can toggle between the old and the new behaviour for example. I do realise that this change might not be favourable to all users, as many will have different requirements for their secrets storage solution. There is some more work to be done within the included documentation that is dependent on the outcome of above discussion.

burke commented 3 years ago

It doesn't seem crazy on its face but I kind of wonder if we're solving the problem in the wrong place here, and if caring about this just indicates that we should put more pressure on a move to Hashicorp Vault or something like that.

If we would do this, and unless we're re-encrypt ejson in every repo, it would only apply to new secrets anyway.

I'm not fundamentally against this change, if it's generally seen as the right path forward for now, though I don't feel comfortable asserting that this is a reasonable thing to do security-wise.

Also, not pointing at anything in particular here, but just to remind for future discussion, this repo is public.

clayton-shopify commented 3 years ago

Although scrypt is cryptographically strong, it has fundamentally different security properties than a cipher. In particular, it's possible to check whether a given plaintext is the correct one simply by hashing it. This implies that low-entropy values cannot be protected by scrypt (or any other hashing function), since they can be brute-forced. If we add a hash alongside each encrypted secret, then we'd be transforming ejson from a tool that can safely encrypt any secret to one that can only safely encrypt high-entropy secrets. To me that seems like a large sacrifice.

thepwagner commented 3 months ago

Greetings from the future: this is very stale, has conflicts and never progressed from a draft.

I'm going to close it.