Open eric1234 opened 2 years ago
I haven't exactly thought hard about this in a while so I think you're probably spot on. I wanted each document in the encrypted database to represent one document transformation, whether that's a create, update, or delete, so that an attacker wouldn't be able to know which documents are being modified. If encrypted records map 1:1 to decrypted documents, then an attacker might concentrate their decryption efforts on a file that changes often, or by monitoring CouchDB server activity (such as if the attacker is the server admin) to know exactly when specific files were updated. Depending on the attacker's knowledge of the program, that information -- what documents are being updated often, and when documents are being updated -- could be very revealing. Instead, although hashing the entire document is a bit heftier of an approach, the attacker can only determine when some kind of document interaction occurred: not which document, nor what kind of interaction.
This is probably overkill for most security paradigms. If you wanted to produce a PR with a different approach, I'd be happy to review it.
I really appreciate the insight. With the right program I could see how keeping a consistent ID might reveal something about the information being stored. I'd be curious if the changing of the ID impacts the Couch replication in any way. Right now the only impact I know of is that compaction wouldn't work. If that is the only impact it's just a trade-off between security vs storage costs.
For your needs, it sounds like changing IDs is worth the storage cost. For other applications being able to compact might be worth the slight loss in security. And of course if you did decide to have a consistent ID do you just hash the real ID or do you hash the real ID + some salt? That would depend on if the application is using the real ID to store information (intelligent key) or the ID is random (UUID).
Because of these different needs if there was a PR it seem the method of generating the encrypted ID would need to be configurable. Maybe a callback function so the application could inject whatever behavior they want. This callback could have a reasonable default. The breaking of compaction seems surprising to me so it seems the default would be best if it kept the ID consistent. Since Pouch/Couch encourages intelligent keys salting that hash also seems good for a default. But then an application could override with something different depending on their needs.
What I'm not sure of is adding an option like this making things too complicated? What about backwards compat? If the default callback was different than the current behavior that would be a breaking change.
You're right about the trade-offs. I could see providing a toggle around how ComDB hashes IDs, allowing the user to choose between salt-hashing the doc (current behavior) or just the decrypted ID, but nothing more complicated than a toggle.
If you feel a toggle is insufficient, I'd be open to hearing a further case for other alternatives.
When generating the
id
for the encrypted record it looks like you are hashing all the data in the record:https://github.com/garbados/comdb/blob/master/index.js#L101-L103
I understand the reason for needing the ID to be deterministic but why use the hash of the entire record? It seems if you change any data on a record that would create a new record vs doing a revision on the previous record. I'm unsure what impact this has on syncing but at the very least it means compaction won't work as from PouchDB's perspective those are two different records rather than the current record and a previous record.
Would it be perhaps better to just hash the cleartext ID to create an obfuscated deterministic ID for the encrypted record? If you were worried about someone determining the cleartext ID (which might contain sensitive data since PouchDB encourages intelligent keys) via rainbow table or something you could always generate a per-record salt to be stored with the encrypted data.
Just wanted to inquire to make sure I'm not missing anything obvious on why that is a bad idea.