Protection against guessing

chrysn commented 1 week ago

If I understand draft-mcnally-envelope-07 right, any receiver of an elided document can undo the elisions (and get all the cryptographic verification that indeed what they found is right) if they guess.

For example, if my data is

"Alice" [
    "knows": "Bob"
]

and I publish it as

"Alice" [
    "knows": ELIDED
]

then any receiver who knows who is in our class (Alice, Bob, Eve, Mallory and Sybil) can find out whom Alice knows by just trying all the available values.

This is similar to the trouble we've had in RDF/FoaF when foaf:mbox_sha1sum was popular: You may not be able to find the address in open space, but if you got your hands on the list of pseudonyms used as hotmail addresses those days, you could easily deanonymize an otherwise private email address. Similarly (although admittedly I didn't find any reliable sources on a quick search, but I do remember there was a case) censored documents having been uncensored simply because there was only a specific set of characters whose width would sum up to the distance of the words before and after, reducing the problem to a matter of finding anagrams.

There is an alternative, but it is costly (and actually that cost is one of the factors deterring me from using Gordian for CoRAL, but then if that cost is not there, neither is the elision benefit for these applications):

All elidible items (envelope, leaf, node, assertion IIUC) could get a salt prepended to them -- a random, maybe 128bit value. Thus I'd use (with abbreviated salts in parentheses, although they wouldn't usually be shown in envelope hierarchies)

(123)
(456)"Alice" [
    (789) (0ab)"knows": (cde)"Bob"
]

and publish

(123)
(456)"Alice" [
    (789) (0ab)"knows": ELIDED
]

Now someone who has an idea of whom Alice might know does not just have to guess that it is Bob (something that may just take a hand full of guesses) but also has to guess the salt associated with that ("cde" would actually be much longer).

This is particularly costly when the individual elided items are short (like, when you have a list of countless sensor data values), but that is also when the chances of guessing the items is the largest because the items are so small that guessing is feasible.

(Don't pin this on the term of "salt" … it neither fits "salt" precisely because there it is secret, or maybe it fits because it is as public as the salted value … at any rate it's not a nonce because there is no strong requirement that it is only used once).

wolfmcnally commented 1 week ago

@chrysn :

This is why envelope supports salting at every level. If you're concerned about guessing an assertion's elided object:

"Alice" [
    "knows": "Bob"
]

Recall that every part of an envelope is also an envelope, and can therefore carry assertions.

So the object of an assertion, being an envelope, can have a 'salt' assertion added to it, where the Salt object is some amount of random data deemed secure. This de-correlates the actual object from its hash in the tree:

"Alice" [
    "knows": "Bob" [
        'salt': Salt
    ]
]

And it still looks the same when received in its elided form:

"Alice" [
    "knows": ELIDED
]

Our reference implementations let you choose the amount of salt you want, or can choose an amount of salt to add that is never less than 8 bytes, and increases with the message length, but which chooses a random length as well, so both the salt contents and the length of the salt can be unknowns to receivers of elided documents.

We made salt opt-in because sometimes you want correlatability, but in cases like you mention, for a few bytes more you can opt out of correlatability on a per-field basis.

You can see our unit tests for this feature here.

chrysn commented 1 week ago

That's a fair approach. I think the document would be better if it pointed out that this is an option. In particular, it is worth mentioning because the document promises holder-steered gradual reveal, and upholding that promise is dependent on the issuer's choice of salts.

wolfmcnally commented 1 week ago

@chrysn By "the document" are you referring to the Internet Draft? Because our implementation of salt is an extension to the base specification. We should probably at least write a document for our research repo. We currently have a document there that describes the tag and format of a Salt object, but nothing that actually specifies the salt extension outside of our current reference codebase. @ChristopherA

wolfmcnally commented 1 week ago

Here is an early document I co-wrote on the subject of decorrelation.

chrysn commented 1 week ago

By "the document" I mean draft-mcnally-envelope: This is where the promise of holders are promised they can do selective reveal. It doesn't have to include the salt specification, but I think it should mention that making full use of that capability requires salting by the issuer.

wolfmcnally commented 6 days ago

Except it doesn't. Sometimes you don't care about decorrelation and sometimes you do. Same thing with elision itself: there are many applications of envelope that don't need elision, and that doesn't make envelope any less capable; those not needing elision don't pay for it and they can just ignore it.

The I-D is a very focused specification and its purpose is not to delineate every possible capability of envelope.

wolfmcnally commented 6 days ago

Elision happens to be baked in to the very core of envelope, which is why it appears in the I-D. Decorrelation using salts, not so much. But it's possible and easy to do, and we have reference code for it.

chrysn commented 5 days ago

Sometimes you don't care about decorrelation and sometimes you do

My point is that those "you" are different -- what the issuer cares about is not necessarily what the holder cares about.

it's possible and easy to do

Yeah, and having it somewhere else is completely suffices specification-wise – there's no strong reason why the mechanism would be in here. But this text is where the promise is made that that holders can do privacy preserving elision, and as a reader I'd think of that promise as misleading until I found (in this case by asking around) that there is an intended mechanism. Were there any breadcrumbs to the solution, that misconception would not have occurred.

BlockchainCommons / WIPs-IETF-draft-envelope

Protection against guessing #9