element-hq / element-web

A glossy Matrix collaboration client for the web.
https://element.io
GNU Affero General Public License v3.0
11.28k stars 2.02k forks source link

Implement E2E key verification process #2142

Closed richvdh closed 5 years ago

richvdh commented 8 years ago

We should implement some sort of process to help people do device verification instead of encouraging them to blindly stab at "verify"

richvdh commented 8 years ago

For example, Alice wants to verify Bob’s device:

Future alternatives might involve scanning a QR code or using an NFC bump.

It’s possible to design a process which establishes verification in both directions (so Bob will also know about Alice’s device), but it’s Complicated.

richvdh commented 8 years ago

(See also #2622: the current UX encourages people to stab at 'Verify' to make the ugly triangles go away)

Cadair commented 8 years ago

Alice types it into the verification dialog

Would you need to type it in? Could it not be displayed on both screens like with Signal?

pingiun commented 7 years ago

The implementations in Whatsapp and signal are pretty good UX, except that they use a hexidecimal string for vocal verification. Something like this would be better, keybase uses this and I feel it's very easy to use.

ara4n commented 7 years ago

enough of the rest of e2e is now in place that this should be p1

frabcus commented 7 years ago

This is fantastic! I've a lot to say about this, and I'm not sure if this is the right ticket or it should have a design doc or be split up. Key things that come to mind:

  1. Encryption needs to be on by default, or people won't use it. So that should be definite goal. Hope it is!
  2. Verification with an out-of-band method is not socially understood as needed yet. This means that basically nobody will do it, unless they have a specific privacy need. I think (compulsory but easy) verification via email and/or phone number is a good intermediate method.
  3. Then even if verified by (relatively insecure, but way better than unencrypted) phone/email, the user interface can make clear you're not fully secure, but not in too negative a way (after all, that's as secure as WhatsApp or Signal!)
  4. And lead you into out of band confirming that as described in this issue - by reading something out over a voice call. Ideally this would be four or five words in a common language, rather than long codes. And if codes, definitely don't have both caps and lowercase.
  5. In person verification by QR code is really fun and easy too, if led through it nicely by the user interface.

Phew!

rschulman commented 7 years ago

We ought to be keeping eyes on other ways of transferring keys as well, such as keybase or the new key transparency project from Google. If there are existing methods of key verification, we could either use them or learn from them.

rschulman commented 7 years ago

One other option to consider is whether we can create a more user friendly challenge/response system. If the verifying person can ask the person to be verified a question (obviously over an encrypted channel) that only they would know the answer to and that person can respond with a signed response, that level of security is probably good enough for most threat models.

uhoreg commented 7 years ago

Some of my recent thoughts regarding this:

  1. QR codes are cool and will make things really easy for people with the right equipment, but we need a baseline verification method for people who can't scan QR codes, and this baseline needs to be as easy as possible. My suggestion would be to leave this issue for tracking improvements to the baseline verification, and create separate issues for QR codes/NFC/keybase/etc.
  2. For the baseline, I think we're stuck with "read something out, and have the other person verify what you read", which is basically what we currently have with the base-64-encoded key, but using something easier to read than [a-zA-Z0-9+/]. Currently, we have 43 base-64 digits, so I take it to mean that we have 256 bits to verify.
    1. The "something" that is read out must be able to work with assistive technologies (e.g. screen readers)
    2. There was a suggestion somewhere of using emoji. However, there aren't that many emoji that would be suitable for use (e.g. easily recognizable, easy to distinguish from other emoji, non-controversial, etc.) In fact, I went through the category lists from emojipedia, and I could only pick out less than 100 emoji that I would want to use for this purpose. If we managed to come up with a list of 128 emoji that could be used, then the verification process would involve comparing 37 emoji, which is quite a bit.
    3. I think that the best option is to use a word list. The EFF has published two word lists: a long list (7776 words) and a short list (1296 words). Although the word lists are intended for passphrases, it should be fairly usable for key verification, and can be pared down if some words should be excluded (e.g. if there may be confusion between two words -- although they've already dropped homophones). They seem to have already put a lot of thought into the word lists. If we take 4096 words from the long list (to make math easier), then verifying the full 256 bits would involve comparing 22 words. If we take 1024 words from the short list, then verifying 256 bits would involve comparing 26 words.
      • Alternative word lists are the PGP word list (2*256 words, so verifying 256 bits would involve comparing 32 words), and the list given in RFC1760/RFC2289 (2048 words, so verifying 256 bits would involve comparing 24 words, but some of the words in that list sound similar such as "sung" and "sunk", so may need to be pruned down)
    4. For people who don't speak English, we can provided word lists in different languages (a first cut would be to just translate the English word list, though some words may need to be replaced if there are homophones in the translated language). If two people are trying to communicate, I think it should be safe to assume that they share a common language. This language may not be a user's preferred language (the language that their Riot UI is set to), so the UI should allow switching between languages within the key verification screens. For example, if I was trying to verify keys with someone who spoke French but not English, I would want to be able to use the French word list even though my Riot UI would be set to use English.
    5. We may want to support different paranoia levels for those who don't want to have to verify all 256 bits. For example, something along the lines of "For extremely paranoid, compare these 26 words (256 bits). For moderately paranoid, compare these 16 words (160 bits). For "I just don't want my kids reading my messages", compare these 13 words (128 bits)". (As a point of comparison, PGP fingerprints are 160 bits long)

[end wall of text]

kscz commented 7 years ago

What exactly are we planning on validating through this process? @ara4n seemed to imply in matrix-dev that this was meant to be a per-user+device-pair process, but the language we're using in this issue appears to imply that this will be a "here is my device's public key but as a bunch of simple words."

I made this pull request to make a maximally convenient mechanism for smartphones, but this definitely falls flat for situations where you aren't physically nearby. VoIP calls seem good in theory, but especially given how frequently keys are changing due to the brower-based nature of riot, it seems inconvenient to not have a fast and complete way to operate!

How comfortable are we in providing multiple separate mechanisms to confirm keys? I'm fine having a "code phrase" as a mechanism which works entirely within Riot, but QR codes provide a faster, easier way for those who have the appropriate equipment and are physically close by.

We could present it as a graceful degradation:


Independent of anything else, do we want to make some syntax for a "validate statement"? I was thinking we could have a string like: /verify/${user}/${device_id}/${device_fingerprint}/

The QR code could contain a statement like this and Riot would validate the set (presenting a "I see that information exists for ${user} would you like accept this device as verified?").

I think this would also lead nicely into a mechanism where a client could share keys between devices: my phone and desktop are E2E verified, and I can share/send these validation statements from my phone back to my desktop when I visit a friend.

richvdh commented 7 years ago

@uhoreg and @kscz are thinking along similar lines to me here. I had previously written a few words about this at https://docs.google.com/document/d/1Exe28_ymBPXQbJkhB0lHWgH4Mjb1duwj62NjUKUz0Uc (see in particular "verification interface" and the footnotes), but I don't think I transcribed it here, for which apologies.

Multiple means of verification would work well here, with users picking the most convenient.

One other thought had been to investigate whether it was possible to verify in both directions with a single qr code scan or phrase exchange.

uhoreg commented 7 years ago

One other thought had been to investigate whether it was possible to verify in both directions with a single qr code scan or phrase exchange.

From the Signal blog post, it looks like they're doing that by just concatenating two individual device fingerprints, which is simple enough to do. They say:

so we designed the safety number format to be a sorted concatenation of two 30-digit individual numeric fingerprints.

Though I'm not sure if I would do a straight concatenation, though, if we do something like display a word list on both devices and let the users compare, since some users will start comparing from the beginning, and say "OK, it's the same so far, that's good enough", and essentially only verify one of the two keys. But we could do something like: concat the keys and run the concatenated string through sha256.

uhoreg commented 7 years ago

I think that there are three possible ways that key verification can happen:

  1. asynchronous verification of one party's key: e.g. Alice posts her keys somewhere public, Bob fetches her keys, checks them somehow (e.g. if Alice's keys are PGP-signed), and tells Riot that the key is verified
  2. asynchronous verification of both parties' keys: e.g. Alice and Bob correspond over PGP-encrypted/signed emails, and want to check their Riot keys over that channel
  3. synchronous verification of both parties' keys: e.g. Alice and Bob meet in person/talk over the phone/video conference and verify keys

For 1, Alice needs to be able to extract her keys to sign and post, and Bob needs to be able to verify her key by themselves. This is fairly similar to normal PGP key verification. But there may be multiple devices involved, so it may be nice for Alice to be able to say: "these are all my keys: [blob]", and allow Bob to import the blob and verify all Alice's devices in one go.

For 2, to make things easier, we can combine Alice's and Bob's keys somehow, so that Alice and Bob only need to compare one string to verify the keys simultaneously, similar to what Signal does. However, Alice and Bob may have multiple devices to verify, so do they select all the keys they want to verify and mash them together (which would require them to tell each other what devices they are verifying)? Or else perform m x n key pair verifications? (e.g. Alice's device A with Bob's device a, Alice's device A with Bob's device b, etc. yuck) Or just treat it as 2 times the first case? Or assume that they'll each just verify one key each and rely on #2714 to get the other devices verified?

For 3, I think that we can make things easier by creating a temporary secure channel between Alice and Bob's devices for the purpose of verifying the keys. It would go something like this (warning: lots and hand waving ahead): Alice and Bob tell their devices they want to verify each others' keys. Their devices contact each other somehow and set up a shared secret (maybe something Diffie-Hellman-ish), and then display a code to Alice and Bob. Alice and Bob verify the code with each other to make sure that there was no MITM (as in ZRTP). Once the code is verified, the devices use the secure channel to send their keys to each other, and mark the key that they received as verified. The advantage of this over treating 3 the same as 2 is that since the shared secret is verified immediately after being created, the code used to verify it doesn't have to be as long since an opponent would not have much time to brute force (e.g. it would probably suffice to compare 2 or 3 words from a sufficiently long wordlist. Even comparing a list of emoji might be manageable).

[end wall of text]

Thoughts?

uhoreg commented 7 years ago

I've put a slightly more fleshed out flow for case 3 from my previous comment at: https://cryptpad.fr/pad/#/1/edit/0eqaXl2wZhiq5s5Ty-wzcg/QL4+6V9poZCmbZI0ispNMFbU/ @richvdh can you take a look and see if it seems sane?

kscz commented 7 years ago

@uhoreg reading through the proposal, I'm not onboard. The main issue is that the whole point of end-to-end encryption is that you don't trust the intermediate server(s), and since you're relying on an effectively unauthenticated path for the "use DH key exchange" you don't have much ability to assume that the server didn't MitM the handshake and just replace the handshake and verification code with a different code.

As a quick response to your initial comment, I would say that all of those key verification methodologies can co-exist. We basically already have the first one by virtue of the /verify commands (i.e. Alice can post her key in a /verify command, publicly, and anyone can confirm it).

For the second point I need to go read up on what Signal is doing.

For the third point, the issue is that you need to have the secure channel inherently contain ancillary information which allows you to confirm that it isn't being tampered with. When you're confirming keys, you can't really, programatically know that the person on the other side of the connection is who they claim they are. Either you need to do something like be in person and not use the server to communicate, or you use a channel over the server which would be infeasible for the server to intercept and replace the contents (i.e. a video or voice connection).

The way I foresee these handshakes going in person is something more to the effect of...

1. Simple QR code sharing (least room for error in impl; highest user friction)

  1. Alice and Bob meet in person, and decide to share keys
  2. Alice goes into a Riot menu and hits "show QR code for verification"
  3. Riot shows a QR code containing matrix://verify/@alice:example.com/ALICE_DEVICE_ID/ALICE_DEVICE_SIGNING_KEY
  4. Bob opens a QR reader app, scans the code, it provides a matrix:// URL which Bob clicks and opens with Riot
  5. Riot shows Bob that he has confirmed Alice's key for her device
  6. Alice exits the QR code display mode
  7. Alice and Bob repeat steps 2-4, but with Bob and Alice's positions reversed

Main advantages for this proposal:

Main disadvantages for this proposal:

2. QR code sharing with bi-directional confirmation (more possibility of developer error; lower user friction)

  1. Alice and Bob meet in person, and decide to share keys
  2. Alice goes into a Riot menu and hits "show QR code for verification"
  3. Riot generates a nonce, and displays a QR code containing matrix://verify/@alice:example.com/ALICE_DEVICE_ID/ALICE_DEVICE_SIGNING_KEY/nonce
  4. Bob opens a QR reader app, scans the code, it provides a matrix:// URL which Bob clicks and opens with Riot
  5. Riot shows Bob that he has confirmed Alice's key for her device, Riot asks if he would like to send Alice confirmation of his own device's key
  6. Bob selects "yes", and Riot sends a direct message to Alice with the contents SOME_VERIFY_REQUEST/@bob:example.com/BOB_DEVICE_ID/BOB_SIGNING_KEY/@alice:example.com/ALICE_DEVICE_ID/HASH_OF_MESSAGE_PLUS_NONCE/SIGNATURE_WITH_BOBS_SIGNING_KEY (note that the message does not include the nonce at any point, purely that the nonce is included in the computation of the hash of the message. This is a shared secret which was transmitted over a secure channel which would not be intercept-able by the server)
  7. Alice's instance(s) of Riot would receive the message, each device would see if the message applied to them (by checking the device ID in the messages).
  8. If the message applied to that instance of Riot, it would check for an active QR-code display. Riot would then check that the contents of the message to see if the hash included the correct nonce, the user, device ID and device key match, and if they all match, ask Alice if she wanted to confirm Bob's device
  9. Alice clicks yes, and at this point, Alice and Bob's devices both trust each other

Main advantages for this proposal:

Main disadvantages for this proposal:

3. The Secret Question Method (or the off-the-record method; very similar to what @uhoreg wrote)

  1. Alice and Bob decide they would like to share keys
  2. Alice enters a direct chat with Bob, Bob joins the room
  3. Alice hits a button which says "verify keys"
  4. Alice is presented a list of devices and asked "which of Bob's devices will Bob be verifying?", Alice selects Bob's device
  5. Alice is presented a dialog box with a field labeled "secret question" and "secret answer". She puts in a question like "What did we do on thanksgiving?" and the answer "bury treasure behind the sycamore tree", and hits continue
  6. Riot generates a message SOME_VERIFY_REQUEST/@alice:example.com/ALICE_DEVICE_ID/ALICE_DEVICE_SIGNING_KEY/Where is the treasure buried?/SIGNATURE_OF_HASH_OF_ANSWER_WITH_ALICE_SIGNING_KEY and sends it to the Alice+Bob direct chat room [1]
  7. Riot sees the message and after confirming that the device ID and signing key match, shows Bob a message in the direct chat which says "key verification request" and a button next to the message saying "accept" or "reject"; Bob selects accept
  8. Bob is presented with a dialogue box that shows Alice's question: "What did we do on thanksgiving?" and a box for him to enter an answer
  9. Bob enters "bury treasure behind the sycamore tree", and Riot validates that the hash matches, and tells Bob that Alice's key has been validated
  10. Alice and Bob repeat steps 3-9 with Alice and Bob's positions reversed

Main advantages for this proposal:

Main disadvantages for this proposal:

Summary:

I think doing the work for all of these proposals is feasible and reasonable to do within a good timeframe. The main disadvantage to the QR-code-based proposals is that they're blocked on the complete matrix:// protocol being implemented based on comments by @richvdh and @ara4n in my PR [2]. The "secret question" proposal has a mixed bag of negatives and positives, but fixes a lot of issues with the current workflow.

Any other devs want to chime in on which they would like to see first? I'm pretty motivated to start working on any of the three at this point. I still need to add in a /verify command to both Android and iOS clients (that'll be this weeks project, at least for android).

Footnotes:

[1] - I think you can leave out the hash here to make guessing the answer computationally infeasible for an attacker (you have to hash then validate the signature for the hash, no shortcuts). I may be wrong though, in which case you should add in the hash and the signature of that hash. [2] - PR #828 - As I've stated before in the main chat channels, I'm opposed to the matrix:// URI protocol as I believe it will represent a fairly large attack surface in Riot for iOS and Android devices, but it sounds like the improved user experience wins over the attack surface concerns in the mind of the other devs. I would argue that since Riot already has access to the camera, maybe just pulling in an open-source QR code reading library and ignoring the URI protocol would be better...

uhoreg commented 7 years ago

since you're relying on an effectively unauthenticated path for the "use DH key exchange" you don't have much ability to assume that the server didn't MitM the handshake and just replace the handshake and verification code with a different code.

@kscz That's the point of the handshake code, is that it's derived from the shared secret, and so the only way that the handshake codes are the same is that there wasn't a MitM (or that the adversary has infinite computing power). The attacker can't just "replace" the DH shared secret with their chosen secret, and so they're stuck trying different values to see if they can generate a collision. Hence all the calculations in the long paragraph in my proposal in the cryptpad.fr doc. Is there a specific attack that you're thinking of?

The handshake is essentially the handshake from ZRTP, which was designed by someone who knows more than I do about cryptography.

For the second point I need to go read up on what Signal is doing.

My own impression is that if we have a good synchronous verification system, then we shouldn't worry too much about a good asynchronous two-way verification system.

Regarding the "Secret Question Method", given that it's already implemented in a system, it would be interesting to find out what users' experiences are with it. (I haven't tried it out myself.)

FTR, I don't have any opinions about QR codes, except that it should be implemented in some way since it's convenient for users, and that there must be some other method implemented as well for those who can't use QR codes.

kscz commented 7 years ago

@uhoreg Ah, I missed the fact that ZRTP involves the step "verbally compare the Short Authentication String (SAS)". I also failed to read some of your description correctly: the comment "hash it (see below) and distill it into a short 'handshake code'" didn't read correctly when I was trying to understand your proposal. That effectively has the constraint then that you must use another channel of some sort, and since you mentioned that this was for in-person verification, that makes perfect sense!

My primary concerns with the SAS basically boils down to the same concerns with any similar system where you check a summary:

  1. Trying and validate a long-lived, very important secret from a short-lived, partially-checked, opaque secret seems like a bad idea. It is definitely improbable that it could be intercepted (the provided time constraints and deriving the checked value from the ephemeral key guarantee that), but I normally start with aiming for theoretical perfection and make trade-offs if it's infeasible
  2. if you're already in-person, is there any reason not to aim to validate the whole key directly, and not involve the server at all? Your proposal definitely seems like a good long-distance key confirmation mechanism, though I'd prefer the "secret question" method, as I stated before, lets people decide how difficult to make their secret
  3. The values that the users are checking are opaque, and users might not do a very good job of checking them (i.e. ideally we will implement it in a way where we avoid 0/O and 1/I/l in the same string, etc), whereas words or other familiar things might be easier for the user to check (though the secret question has other issues)

For the secret question thing, I find it pretty straightforward. The comments about secret fatigue are from my own experience confirming secret keys repeatedly when my friends got new computers.

Anyway, apologies for the confusion on ZRTP being possible to man-in-the-middle! I was wrong, and misread things.

richvdh commented 7 years ago

if you're already in-person, is there any reason not to aim to validate the whole key directly, and not involve the server at all?

Each key is 256 bits, and there might be many of them...

The values that the users are checking are opaque, and users might not do a very good job of checking them (i.e. ideally we will implement it in a way where we avoid 0/O and 1/I/l in the same string, etc), whereas words or other familiar things might be easier for the user to check (though the secret question has other issues)

Isn't this why we hash it and use it to choose words from a wordlist?

uhoreg commented 7 years ago

Yeah, the main reason for not wanting to verify the whole key is that it's much longer. As mentioned previously, verifying 256 bits would require checking roughly 22 words selected from a 4096-word word list, and it would need to be done once per key, whereas in my proposal, you most people could probably get away with 6 words and both keys get verified. Though, as I mention, you could have a customizable paranoia level and check a few more words. For example, if you verify 10 words and your attacker has 5,000,000,000 times more computing power than you do, then they have a less than a 1/100,000,000 chance of a successful attack.

Of course, there should also be an option for checking the full key for people who are super paranoid.

BTW, one more downside to the OTR "Secret Question Method" is that it's vulnerable to shoulder surfing. That is, if an attacker observes Alice typing in the secret answer, then the the attacker can impersonate both sides, and there's no way at all to check it.

(I've tweaked my proposal a bit to reduce the time that an attacker is able to try to attack the DH portion.)

kscz commented 7 years ago

Each key is 256 bits, and there might be many of them...

@richvdh Well yes, but things like the QR code solve that. When you're not in person and using something like a voice call, having a shorter verification path is nice, but you can establish a more high-bandwidth, non-lossy channel when in person (NFC, QR code, bluetooth, etc). This is my point about preferring, if possible, to start with trying to solve for the theoretically perfect solution (i.e. actually confirming the whole key) before considering the options which make tradeoffs.

Isn't this why we hash it and use it to choose words from a wordlist?

@richvdh Ah, fair! This still doesn't fix my primary complaint that I don't like using a summary version of the key, but that does solve my gripe about readability. The downside to this is largely that it makes it so that the Riot version of this protocol mandates a wordlist for the implementation in other clients. Something which involves a simpler "show a QR code" or "ask a question and check the answer" has less shared code/large tables of words which have to be used across implementations. Also, as others have pointed out, when trying to establish a wordlist has translation challenges.

Yeah, the main reason for not wanting to verify the whole key is that it's much longer.

@uhoreg But that only matters if the way that you're checking via voice, no? Can we start with another mechanism once we're in-person?

BTW, one more downside to the OTR "Secret Question Method" is that it's vulnerable to shoulder surfing. That is, if an attacker observes Alice typing in the secret answer, then the the attacker can impersonate both sides, and there's no way at all to check it.

I'm not sure I understand how, what would someone do if they saw the answer? The users don't have to enter that secret more than once, and it only serves to validate that the holder of the public key is the person you think it is at that moment. Ideally, users would cycle between multiple personal questions per device pairing.

In person, assuming the "secret question" method was my only verification choice aside from checking the whole key manually, I wouldn't bother using an actual question/answer at all, I would say "Hey Alice, let's verify keys! I'm going to enter in 'buzz112358BLAST)(iguanaGandalf?Rice!'" and then Alice would just enter in that key.

richvdh commented 7 years ago

Each key is 256 bits, and there might be many of them...

@richvdh Well yes, but things like the QR code solve that.

I might have missed something in the noise, but I didn't think anyone was arguing against QR codes (or other mechanisms like NFC). I think we're discussing alternatives for when QR codes don't work well.

uhoreg commented 7 years ago

Yeah, I think everyone is agreed that QR codes or something similar is good, but we still need something usable for people who don't have access to fancy machinery, which is what my proposal is about.

@uhoreg But that only matters if the way that you're checking via voice, no? Can we start with another mechanism once we're in-person?

What other mechanism do you suggest for people who can't do QR codes/NFC/etc. Even doing a visual comparison of 22 words can get tedious (and then it has to be done once for each key). And we probably still need to have something for voice-only communication anyways, for those people who want to verify by phone.

I'm not sure I understand how, what would someone do if they saw the answer? The users don't have to enter that secret more than once, and it only serves to validate that the holder of the public key is the person you think it is at that moment. Ideally, users would cycle between multiple personal questions per device pairing.

It depends on exactly how it's implemented, but assuming a flow like you described above,

  1. Alice initiates the secret question method, and types in the secret question and secret answer
  2. Mallory sees Alice typing it in, and intercepts the message from Alice's device to Bob's device
  3. Mallory initiates the secret question method to Bob's device, pretending to be Alice, and using the same secret question and secret answer, but using Mallory's key instead.
  4. (at some point, Bob fetches Alice's key, but Mallory intercepts and sends her own key instead)
  5. Bob correctly answers Mallory's request.
  6. Riot checks the hashes and tells Bob that the key that he got from Mallory is validated as Alice's key.
kscz commented 7 years ago

Ah, okay, so then we're trying to establish the best mechanism for in-person verification given the constraint that we only have voice verification. Got it, I'll stop conflating the two!

@uhoreg broad comment on the shoulder-surfing attack is that I don't think the Mallory you have supposed feels very realistic. To accomplish that attack, Mallory would need to already have a fake device in Alice's account with a set of valid credentials (in order to send the verify request to Bob), in addition to controlling the server itself (dropping the request from Alice's real device), and being in-person when Alice and Bob are exchanging keys (to see the question and answer typed), in addition to being fast enough to successfully prompt Bob with a request at the same moment Alice sends her request. I can accept the first two conditions because that's pretty much every attack we're discussing, but the remaining conditions seem challenging for any attacker.

Simple fixes:

I guess the real question I have mostly relates to the benefits of the ZRTP method, given the complexity of it's implementation. From what I can tell, the proposal for the ZRTP method requires:

My big complaint about something that complex is that other clients will have to match that implementation to be able to verify keys once E2E is default enabled. A word-list or emoji-list which is canonical for directly comparing the device keys I think I would accept though (with the caveat of course being that I don't actually have any approval power :stuck_out_tongue:).

I think the secret question method could be implemented with a single message containing the device ID, the secret question, and signature of the hash of the answer. If you wanted to limit use to synchronous contexts you could simply add a "valid until" timestamp on the end of that message.

uhoreg commented 7 years ago

For question of whether Mallory would be fast enough to send her request at (almost) the same time as Alice, I don't think it's that hard, if Mallory has some specialized software pre-written. She just has to type as Alice types, and hit "Enter" at the same time. Though this is definitely "targetted attack by a government agency/crime ring" territory, and not "random hacker trying to steal your credit card information". (And apologies for initially calling it just a shoulder-surfing attack -- I really should have called it "shoulder-surfing + MitM".)

I think that the secret question method as you describe it is vulnerable to a normal MitM attack, though, if the users don't select an answer with enough entropy. The attack would go something like:

  1. Alice initiates the secret question challenge, and sends the SOME_VERIFY_REQUEST/@alice:example.com/ALICE_DEVICE_ID/ALICE_DEVICE_SIGNING_KEY/What is your favourite colour?/SIGNATURE_OF_HASH_OF_ANSWER_WITH_ALICE_SIGNING_KEY
  2. Luckily for Mallory, Alice's chosen secret question is "What is your favourite colour", and Bob, being male, can only identify 10 distinct colours (😛), so Mallory makes 10 guesses, verifies Alice's signature on each of them, and finds that the answer is "blue".
  3. Mallory then sends the message SOME_VERIFY_REQUEST/@alice:example.com/ALICE_DEVICE_ID/MALLORY_DEVICE_SIGNING_KEY/What is your favourite colour?/SIGNATURE_OF_HASH_OF_ANSWER_WITH_MALLORY_SIGNING_KEY

So it really relies on Alice and Bob properly selecting a shared secret, which one part that worries me.

From what I can tell, the proposal for the ZRTP method requires:

  • a mechanism to generate a DH tunnel inside a normal set of chat messages

It would probably be a set of special to_device messages. I haven't worked out the details, but to me, it doesn't seem that hard. The messages would basically be "Hi, I'm device xxx, and I want to verify with you", "Here are my DH inputs, and parameters for hash", and "My owner says we're good, so given our DH negotiation, here is the HMAC of my key".

  • a mechanism to compare a 256-bit key (or subset/digest of that key) vocally

I'm assuming you're talking about the DH shared secret, and I would imagine that it would be a word list or emoji list.

Is there a benefit to not comparing the device keys directly once you implement this?

Again, it's the length of the comparison. Comparing 2 keys (Alice's and Bob's) would take about 44 words, which most people would run away from. Whereas with the method that I'm proposing, you can probably get reasonable security with 6 words, and pretty good security with 10 words, which would not be that hard for people to do. Of course, someone else should check my math. And maybe do a proper "verifying n bits of the secret handshake is just as secure as verifying m bits of the device key" comparison. I may try my hand at it myself tomorrow; it doesn't seem that hard. I would expect that it would compare pretty favourably, though, since 1) even though we have an n/2 due to birthday attack, that's essentially cancelled out by the fact that we're verifying 2 keys; 2) we can use PBKDF2/bcrypt/something similar to slow the attacker down, whereas bruteforcing a key doesn't have this slowdown; and 3) the attacker only has about 2 seconds to mount the attack, whereas attacking a long-lived key could be done over the course of months or years.

  • timers to limit the scope of an attacker's window

That doesn't seem like much of a complication to me.

My big complaint about something that complex is that other clients will have to match that implementation to be able to verify keys once E2E is default enabled.

I personally don't think that it would be that complicated, compared to all the other e2e stuff, if the DH, hash, and hash-to-words/emoji are implemented in libolm.

kscz commented 7 years ago

She just has to type as Alice types, and hit "Enter" at the same time.

Well, Mallory has to have the server configured to stop verification attempts between Alice and Bob (or at least to delay them for some period). Then she also has to be present, with her computer open at the time that Alice and Bob attempt a verification, and then she needs to see what's typed... is this not defeated by just making the "answer" field a password/opaque field?

So it really relies on Alice and Bob properly selecting a shared secret, which one part that worries me.

Is that any different than your proposal? You have a "paranoia level" setting, and if the users only choose to check the first word, or sends the words they see through Riot (i.e. Bob types "Hey Alice, I see 'waffle monkey peanut piano', that match what you see?") which can similarly be swapped out by a dedicated attacker. All I'm really getting at is: you can't make users care more about trust, and a lazy user will always ruin your security model.

In this case, I'm assuming that the server can't cheaply interpret natural language strings and then guess an answer, so even a mediocre question will still provide a pretty high barrier to attack. Especially given that the functionality the server has to undertake is to guess quickly enough (which involves generating a guess, hashing that guess, and then validating that the signature matches that hash). I think the server would struggle to make an appreciable number of guesses before the delay became obvious/an issue.

Assuming that the server couldn't interpret natural language strings, what would a dedicated attacker do to cause the server to guess? Have collected sets of words with categories and spin the malicious server into action with hinting? i.e. server tells Mallory "I see this question between Alice and Bob" and Mallory tells the server: "colors" and the server has a categorized wordlist? Or something else?

Again, it's the length of the comparison. Comparing 2 keys (Alice's and Bob's) would take about 44 words, which most people would run away from.

But you're going to put in a "paranoia level" anyway; why not just make the same thing for the main public keys? Paranoid users can compare more pieces of the public key, and less paranoid users can compare less of the public key. I'm not quite understanding what comparing the DH ephemeral key is buying you over just verbally comparing some (or all) of the main public key.

Whereas attacking a long-lived key could be done over the course of months or years

Since Alice's client and Bob's client connect immediately, the only way the malicious server could find a non-paranoid collision would be to swap out Alice's and Bob's devices for different devices. I guess I can see that:

  1. Alice logs in and gets a long-lived key
  2. Bob logs in and gets his long-lived key
  3. Alice connects to Bob, and now Alice's client and Bob's client both have a known key.
  4. Mallory finds a collision for Alice's and Bob's public key SAS over the course of the next 6 months
  5. Mallory blocks all packets between Alice and Bob, and provides the malicious devices as MitM targets
  6. Both Alice and Bob will see that the other has has 2 devices (their clients will remember real devices)
  7. Bob to Alice: "hey your device isn't validated anymore"
  8. Alice to Bob: "Huh that's weird. I see the same thing..."
  9. They attempt revalidation. So long as the collision is good enough for the level of paranoia chosen, both users will successfully validate against Mallory

Though really, it should be a massive red flag to Alice and Bob that they suddenly get new devices on the server that they can't control or delete (Mallory can't trick the clients into having the new malicious identity)

uhoreg commented 7 years ago

is this not defeated by just making the "answer" field a password/opaque field?

It is defeated if Mallory can only see the screen, but not if she can observe the keyboard. (Though on many mobile devices, the keyboard is on the screen.)

So it really relies on Alice and Bob properly selecting a shared secret, which one part that worries me.

Is that any different than your proposal?

Yes, because in my proposal, the client can tell you what to verify in order to achieve a certain security level, while in the "secret question" system, it's all up to the user's judgment. In my proposal, the client can tell the user: "if you want to be safe from corporate spying from a medium-sized company, verify n words", whereas with secret questions, the client can't say "Ooh, asking for Bob's mother's maiden name isn't secure enough, but asking the make of his first car should do it." With the "secret question" method, it's entirely up to the user to figure out what is secure and what isn't, and I don't know how much users can be trusted to do that accurately.

In this case, I'm assuming that the server can't cheaply interpret natural language strings and then guess an answer

It doesn't really need to interpret much natural language, since the class of answers is usually contained in the question itself. If the question is "What is your favourite colour", then the server sees "colour" in the question, and guesses a bunch of colours. If the question is "What is your dog's name", the server sees "dog" and "name", and guesses a bunch of common dog names. Even if it couldn't guess the class of answers, it can have a dictionary of common answers. The dictionary might contain the answer to any of Alice or Bob's secret questions, but more than likely, it will contain answers to a significant number of users' questions.

Especially given that the functionality the server has to undertake is to guess quickly enough (which involves generating a guess, hashing that guess, and then validating that the signature matches that hash).

You say "server", but I'm assuming that the attacker has multiple servers at their disposal. Generating a guess is not hard: you have a dictionary. As for hashing the guess, if the attacker has a dictionary, they can create a rainbow table.

Again, it's the length of the comparison. Comparing 2 keys (Alice's and Bob's) would take about 44 words, which most people would run away from.

But you're going to put in a "paranoia level" anyway; why not just make the same thing for the main public keys? Paranoid users can compare more pieces of the public key, and less paranoid users can compare less of the public key.

Yes, and I actually already suggested that (see point 2.v.).

But again, it's a question of the length of the comparison. I suspect that with my proposal, to get the same level of security, you'd need to compare fewer bits than if you were verifying part of the whole key. Which is why I say later on in the paragraph with the sentence that you quoted: "And maybe do a proper "verifying n bits of the secret handshake is just as secure as verifying m bits of the device key" comparison." Although thinking about it more, I think that the math (or rather the setup) may be a bit harder than I thought at first, and I may just start with "verifying n bits of a secret handshake is just as secure as using an m bit symmetric key" which, although it might not as applicable, will still be somewhat helpful since it will provide a comparison with something that we're more used to thinking about.

Since Alice's client and Bob's client connect immediately, the only way the malicious server could find a non-paranoid collision would be to swap out Alice's and Bob's devices for different devices.

That's the same as any of the other proposals, though. Even in my proposal, Alice and Bob are just verifying keys that were downloaded before. Maybe that wasn't clear -- I'll make a note of it in my proposal.

kscz commented 7 years ago

I won't bother continuing to defend the secret question methodology since both @uhoreg and @richvdh seem to dislike it :stuck_out_tongue:. I will say that I don't think it's subject to rainbow table issues for a variety of reasons, but if nobody likes it I'll shut up about it!

I would like to hear more of a fleshed-out examples of how all the stuff after the DH key confirmation is going to happen: are we adding more message types to the core protocol? What will the messages to confirm stuff look like? Will it simply be transmitting the full signing key to both sides, confirming, and then discarding the DH channel you've formed?

I am still curious about any sort of analysis which informs the reason to add the whole DH comparison vs comparing a subset of the signing key! I agree that the analysis is a pain in the butt, but I think we may also be at the point where it's necessary to compare.

A random proposal on just comparing the pubkey via word/emoji list:

  1. Alice hits a button saying "verify your key"
  2. Riot prompts Alice to select a level of paranoia (showing her a scale which says "# of words" with a minimum of something like 6 words/emojis)
  3. Alice picks the lowest level, and is shown "3, 4, 8, 9, 22, 37" followed by a list of six words/emojis
  4. Bob hits a button which says "verify Alice's key for device AABBCC123", and asks him for the 6 numbers Alice sees
  5. Bob enters 3, 4, 8, 9, 22, 37
  6. Alice and Bob are presented the word/emoji interpretation of Alice's key, and confirm it over the phone or in-person
  7. Repeat steps 1-6 with Alice and Bob's positions reversed

That way the attacker would have to have a birthday collision with an unknown subset of the key to successfully attack it. And Alice and Bob could always use the whole set of words (maximum paranoia) to defeat an attack if they're truly concerned about it.

One major advantage: the server is not involved with this protocol in any way. The point at which the users have verified each others devices is invisible to the server. There's nothing to intercept/alter, so the attack model is a lot simpler. Implementation is entirely in UI, no new messages/whatever.

That's the same as any of the other proposals, though. Even in my proposal, Alice and Bob are just verifying keys that were downloaded before. Maybe that wasn't clear -- I'll make a note of it in my proposal.

No, no, that was a comment about the fact that I wasn't understanding this comment:

Whereas attacking a long-lived key could be done over the course of months or years

How would Mallory take advantage of finding a collision for the SAS of a long-lived key? I was trying to figure out how you were pivoting from "the long-lived key is more attackable because it's longer-lived" to an actual attack. Do you have an example of how Mallory would take advantage of this?

uhoreg commented 7 years ago

I've added more details to the post-DH phase of my proposal.

I also took another look at ZRTP, and noticed that it actually had one more step in it that prevents a MitM from being able to use brute force to attack the DH verification. By having one side send a hash the DH parameters before the DH actually starts, it locks the DH exchange so that a MitM would only have one guess to attack the DH exchange, and they have a 1 in 2^n chance of success if Alice and Bob verify n bits. (That Zimmermann guy is pretty clever.) It doesn't matter how much computing power the attacker has -- they get exactly one guess and that's it. Which means that if Alice and Bob check 4 words from a 4096-word word list, that's 48 bits, and an attacker has a 1 in 2x10^14 chance of success, which is extremely small. 48 bits is also 7 emoji (from a list of 128 emoji) or 12 hex digits (e.g. "0123 4567 89AB"), which is still fairly manageable for people, even with the user-unfriendliness of hex digits.

Comparing a subset of the key is doable, but requiring users to enter the bit positions that are being verified just makes it more painful.

uhoreg commented 7 years ago

I've implemented a proof-of-concept version of my proposal as a demo page that logs in two sessions and goes through the steps of my proposal. The source code is at https://gitlab.com/uhoreg/matrix_key_verification and I've put up a copy at https://static.uhoreg.ca/key_verification/

Ashaman- commented 6 years ago

It would be nice if we could associate a PGP key directly with an account (ie. if an email address associated with an account has an associated PGP key, it should be immediately apparent). Accounts with a published & proven PGP key could then sign and publish their E2EE keys, ideally allowing for seamless in-client E2EE key verification based PGP signatures.

ara4n commented 6 years ago

We now have two proposals for proper verification UIs at: