freedomofpress / securedrop-protocol

Research and proof of concept to develop the next SecureDrop with end to end encryption.
GNU Affero General Public License v3.0
47 stars 1 forks source link

Refactor to be KEM-oriented instead of DH-oriented #48

Open bifurcation opened 6 months ago

bifurcation commented 6 months ago

It makes me sad to see things these days that depend on DH as opposed to KEM, since only KEMs. FWIW, I think you could refactor the clue scheme to be KEM-based, if you have a DeriveKeyPair function (as in RFC 9180).

DH-based:

S->SD: g^a, g^ab
SD->R: g^ab, E(g^abc, msgid)
R: (g^ab)^c => msgid

KEM-based:

S: (CT_ac, SS_ac) = Encap(PK_c), (SK_ac, PK_ac) = DeriveKeyPair(SS_ac)
S->SD: CT_ac, PK_ac
SD: (CT_abc, SS_abc) = Encap(PK_ac)
SD->R: CT_ac, CT_abc, E(SS_abc, msgid)
R: SS_ac = Decap(SK_c, CT_ac) 
   => (SK_ac, PK_ac) = DeriveKeyPair(SS_ac) 
   => SS_abc = Decap(SK_ac, CT_abc) 
   => msgid

That shares the properties that (1) the clue is only intelligible by the intended recipient and random otherwise (2) the SecureDrop server can encrypt the msgid to the recipient without knowing the recipient's public key. (In principle, you could also just have msgid=SS_abc and save a symmetric encryption step.) Definitely possible I'm missing some other requirements, though.

eaon commented 6 months ago

First off, I'm entirely with you on the sadness regarding the DH/commutativity dependency, I've been dipping my toes into this problem for a while as well…

As far as I can tell though, your suggestion would break the requirement that third parties ought not to be able to enumerate messages stored on a server at any given point (also over time - #43 discusses this to a certain degree), by re-sharing CT_ac as-is every time clients would poll for new messages. CT_ac would have to be randomisable by a third-party (without changing the encapsulated shared secret) to make this work. I don't understand enough of ML-KEM's math around ciphertexts though so I asked a couple folks if they had ideas how to solve this, but no luck so far. Got any ideas?

PS: Cool work on the ml-kem crate, I've been following (and using it) from afar :sparkles:

bifurcation commented 6 months ago

I'm confused about the enumeration point. It seems like in the current scheme, g^ab is (a) stable per-message and (b) passed on unmodified S->SD->R. So if CT_ac having that property is bad, isn't the current scheme also bad in that way?

I'm also unclear on what you would be trying to achieve by rerandomizing. In the current scheme, a receiver without SK_c learns only list of opaque random values (CT_ac, CT_abc). Repeated queries over time would reveal that some messages were no longer on the server (by opaque ID). In a rerandomized scheme, one could still learn the overall quantity of messages on the server over time. So the major thing that rerandomization buys you is that you can distinguish between the case where no messages are moving and the case where input and output are equal (both of which leave the stored message volume constant). Is that the idea here?

lsd-cat commented 6 months ago

Hi Richard, as per the email exchange, thank you for taking a look into this, we really appreciate it. I think the issue is that this does not represent what we are trying to do originally.

DH-based:

S->SD: g^a, g^ab
SD->R: g^ab, E(g^abc, msgid)
R: (g^ab)^c => msgid

It should be instead:

S->SD: g^a, g^ab
SD->R: g^ac, E(g^abc, msgid)
R: (g^ac)^b => msgid

Where R knows the secret exponent b, and c is randomly generated per request per message.

As per the quantity of messages, we are always returning a fixed number (and hopefully mitigate for side channels there), which is the upper limit of the protocol in its current form (in the low thousands).

bifurcation commented 6 months ago

Ah, thanks @giulio, that makes more sense. I think in my notation, b was held by the SD and c by R, so just aligning your notation to that and elaborating a little:

S->SD: g^a, g^ac
SD: b = rand(), g^ab = (g^a)^b, g^abc = (g^ac)^b
SD->R: g^ab, E(g^abc, msgid)
R: g^abc = (g^ab)^c 
   => msgid
lsd-cat commented 1 month ago

While still retaining the message-fetching part not quantum resistant, https://github.com/freedomofpress/securedrop-protocol/issues/55 is a proposed version of the protocol using PQXDH.