commontoolsinc / synopsys

datastore service for datums
1 stars 1 forks source link

Privacy preserving stores #21

Open Gozala opened 4 days ago

Gozala commented 4 days ago

Store and query concealed data

Previously I have mentioned that in theory current architecture of the store could work with concealed data, that is instead of sending attribute names in plain text you could hash them (or sign with private key) and encrypt values (that aren't entities).

That way observer could only see the shape of the graph without knowing any of the labels or data they hold.

Caveats

Implementing above should be fairly straightforward:

  1. hash (or sign) all attributes
  2. encrypt all values

However that would introduce inefficiency in a system because attribute name-spacing will no be reflected in the store that is foo/bar and foo/baz will no longer be arranged nearby, instead they may fall wherever in the keyspace.

Goal

I suspect there may be a solution here that introduces privacy while retaining keyspace locality. Very naive approach could simply hash and concat individual parts, but that would significantly increase key size. That said there is probably some clever way to do this, it's just someone needs to put some effort

Gozala commented 11 hours ago

I was thinking more about attribute concealment, specifically whether we could hash each segment we while mapping things to keyspace.

Limitation introduced

  1. hashed attributes mean I may no longer ask what do you know about entity get a list of attributes from which you can make sense
    • I mean you could list attributes but they will be bunch of hashes so unless you know what to look for not that useful
    • There had being assumption that what we get here is like JSON m, but with hashed keys it isn’t the caae
    • On the other hand it is kind of the point, replication shouldn’t give you insight into data you’re replicating
  2. There’s perf implications obviously with added hashing, but that’s probably ok
Gozala commented 10 hours ago

Also had being wondering if use of namespacing entities via public key would be a good idea. It would introduce few interesting tradeoffs:

Alternative could be to simply store authorization in transaction info and let the replicas decides which changes are valid and which aren’t

Perhaps hybrid of two is the best compromise, that is namespace entities and let replicas decide how to moderate. Some could choose to gate reads and writes while others may act like public boards

———

either way capturing entity’s origin authority (as in public key) seems like good idea