Message bytes and sizes sent over the network should be indistinguishable from the random stream

yurivict commented 7 years ago

I am looking at what Tox sends over the network. Here are 3 sample sent packets:

 15771 qtox     GIO   fd 26 wrote 113 bytes
       0x0000 0235 204f 8fbb 89ab c134 c4b7 2b80 245a 0b7d 8897 4e0e 8b4e 3f0e b22f 8651 eb3c 0dd3 532a a7fd c8fb 7acf  |.5 O.....4..+.$Z.}..N..N?../.Q.<..S*....z.|
       0x002a c827 d3af 784b 9eaa 108f 181a 2087 f2bb eb22 880c 75e7 e332 a6a2 a047 9a0c 78dd 0a4c cae6 65fc b28e 68f3  |.'..xK...... ...."..u..2...G..x..L..e...h.|
       0x0054 96c6 b582 52bc ba79 a652 6e38 a335 da63 5365 61d0 9874 203e 2c17 6dd5 1b                                  |....R..y.Rn8.5.cSea..t >,.m..|

 15771 qtox     GIO   fd 26 wrote 113 bytes
       0x0000 0235 204f 8fbb 89ab c134 c4b7 2b80 245a 0b7d 8897 4e0e 8b4e 3f0e b22f 8651 eb3c 0ddc c552 c874 db4d 9802  |.5 O.....4..+.$Z.}..N..N?../.Q.<...R.t.M..|
       0x002a ac47 04e1 5fe3 526d 6397 524d 4468 f620 f0b7 3194 9eee 43b9 1d60 21d9 49fd 2f33 cba4 00ce 5968 ab13 6249  |.G.._.Rmc.RMDh. ..1...C..`!.I./3....Yh..bI|
       0x0054 1dfb b178 82cc ae9d 0cb8 4352 38f6 ebcc c7b1 b332 6612 3afe a02e cc23 74                                  |...x......CR8......2f.:....#t|

 15771 qtox     GIO   fd 26 wrote 113 bytes
       0x0000 0235 204f 8fbb 89ab c134 c4b7 2b80 245a 0b7d 8897 4e0e 8b4e 3f0e b22f 8651 eb3c 0d23 eaec 998f 047b 0196  |.5 O.....4..+.$Z.}..N..N?../.Q.<.#.....{..|
       0x002a 8496 04ef 7a04 35c2 8d27 f54d 6a8e faf1 2f37 d464 0bcb fa77 64fc 39e0 3a5b 6592 a112 a859 1e4b f7b5 1bf2  |....z.5..'.Mj.../7.d...wd.9.:[e....Y.K....|
       0x0054 29d2 953d 557f 2888 2d97 23e6 485e 7741 319a ed3a e176 16a4 2dc8 f0c8 fb                                  |)..=U.(.-.#.H^wA1..:.v..-....|

Many first bytes are same in different messages. Message sizes are all 113 (probably DHT traffic). UDP ports are also the same, 33445. This makes Tox traffic easily identifiable, and therefore blockable by rogue authority bodies.

Here is how it should be: every peer has its private and public key. Peer IP is distributed along with its public key. Every message sent to it, including DHT messages, is randomly oversized, and encrypted with the destination peer's public key. Small message sizes should be random within some size range. This size range should be configurable in the advanced settings page.

Currently Tox can be easily blocked.

If the app sends packets with random-looking content, with packets having different sizes and sent on different ports, it makes identification of Tox traffic very hard.

iphydf commented 7 years ago

Thanks for the detailed report. This is indeed planned :). There are several steps required for this and it's a breaking change in the protocol. We're holding off on protocol changes until we're ready to make one big change. We're already working on the prerequisites for it (e.g. @robinlinden is working on data schemas). I'll leave this issue open as a reminder of one of the requirements for the new protocol.

yurivict commented 7 years ago

This isn't a breaking change if you allow both protocols during the transition period.

iphydf commented 7 years ago

That is of course what we'll do, but we're simply not there yet. I don't think it's a good idea to fix this particular problem right now in the current protocol. There are other issues with it that wouldn't be resolved by doing only this. The fact that tox can easily be blocked is not an actual issue for anyone right now, so it's safe to defer it until we have a new protocol design.

yurivict commented 7 years ago

The fact that tox can easily be blocked is not an actual issue for anyone right now

This isn't true. Tox is blocked in several countries. For example, in China.

iphydf commented 7 years ago

I wasn't aware. That is very interesting! Do you have access to a Chinese IP or know a person who does so we can investigate how they are doing it? There are several ways to do it, and changing the packet length fixes only one of them. Here are a few ways:

Detecting packet lengths. Requires a stateful firewall to recognise patterns - you can't just block all packets of length 113.
Detecting other patterns in the protocol, e.g. we have plain text public keys always in the same place in DHT packets. Also requires stateful firewall.
Block port 33445. No stateful firewall needed, just a simple UDP port block.
Block any access to any of the bootstrap nodes. This is quite likely to be the thing China is doing, given that this is usually the way China blocks anything.

yurivict commented 7 years ago

I only know that people who travel to China can't use Tox there. I saw several examples.

My suggestion covers all of the above. Nothing at all should be sent unencrypted, including public keys. Seed DHT hosts should be supplied including their public keys.

iphydf commented 7 years ago

How would you design a protocol that never sends anything unencrypted? What would the very first packet sent from A to B look like?

yurivict commented 7 years ago

When A sends a packet to B, this packet is encrypted with B's public key. The seed's pubkeys are hardcoded. The other pubkeys are sent along with their IPs.

iphydf commented 7 years ago

Ok, it's encrypted with B's public key, but how is it authenticated? Or do you mean the first packet is not authenticated?

yurivict commented 7 years ago

What do you mean "authenticated"? If the receiver has the private key it can read it, otherwise it can't.

iphydf commented 7 years ago

How does B know that the packet came from A and wasn't modified by a MITM?

yurivict commented 7 years ago

It can do this the same way as it does this now, only the traffic is encrypted.

yurivict commented 7 years ago

Currently it just shoots the unencrypted packet to the seed. This should be encrypted with the seed's hardcoded pubkey.

In fact, the protocol as it is is probably fine. You should just add the encryption/oversizing layer. First optionally, later mandatorily.

iphydf commented 7 years ago

Encryption works with a key.
That key should not be known to outsiders.
Public keys are known to outsiders.

How does this work?

iphydf commented 7 years ago

Maybe I understand your suggestion now: are you suggesting that we use the public key as an encryption key for the entire packet for the sole purpose of garbling it, even though anyone can decrypt it?

yurivict commented 7 years ago

Only the owner of the private key can decrypt it.

Private keys aren't known to outsiders. Public keys can be known.

yurivict commented 7 years ago

So, the synopsis of the change I am suggesting:

Every peer IP will have its public key always stored with it.
Every message sent to the peer is encrypted with his pubkey.
Fixed message size is randomized within some size range, customizable in advanced options.
TCP/UDP ports used should be randomized, unless the fixed port value is set by the user. This should particularly apply to the DHT seeds.
Add the crypto library dependency supporting the public key cryptography.
If the hybrid asymmetric/symmetric encryption scheme is chosen, packet sizes should be selected with the same randomization procedure.

This can first work in parallel with the current protocol so that the transition will be seamless for the users.

GrayHatter commented 7 years ago

@yurivict we've discussed this on IRC, and none of us were able to come up with a way we could do this without adding another additional crypto library. Would you like to make that part of your suggestion as well?

yurivict commented 7 years ago

OpenSSL can do this. Are people opposed to it?

Added this item. I am not sure though what the best library for this is.

GrayHatter commented 7 years ago

Personally? Yeah, very. OpenSSL is a very different form a crypto, one I consider much weaker, than NaCl (I'm not a crypto person, you shouldn't take my word for it) and OpenSSL has had a lot of very high profile issues lately. Which also makes me weary. Finally it's hard to use SSL correctly. It's very easy to use NaCl correctly.

On Jan 11, 2017 17:10, "yurivict" notifications@github.com wrote:

OpenSSL can do this. Are people opposed to it?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/TokTok/c-toxcore/issues/419#issuecomment-272046750, or mute the thread https://github.com/notifications/unsubscribe-auth/AAO20AHqywgp98ax9m0MN-eOdb5kklazks5rRX2WgaJpZM4LhDBR .

iphydf commented 7 years ago

Regardless of openssl's complexity and security implications, I think it's not necessary to encrypt the whole packet. An easier way to make it look like random data is: on every packet sent to a DHT node, we create a new key pair and send the public key and nonce in plain text. Both the public key and nonce are random data. The encrypted packet following them also looks like random data. Problem solved. The issue with this is that it's computationally very expensive and it adds a bit of memory overhead.

The real question here is: is that actually necessary? This proposal protects against deep packet inspection which requires custom code specifically written to detect tox traffic. I highly doubt that China has done that. It's much more likely that they just blocked all the bootstrap nodes. Would be nice to figure that out before spending a lot of engineering time on temporary protocol fixes.

yurivict commented 7 years ago

which requires custom code specifically written to detect tox traffic

I think they use SandVine, or similar, system: https://www.sandvine.com/technology/deep-packet-inspection.html They already detect a lot of protocols. For them to add Tox is equivalent to adding 2 lines of code, which they probably already did.

iphydf commented 7 years ago

Right, that would be interesting to find out. It's pretty easy for us to determine what they are doing to block tox. I'd like to investigate that before making changes, so we know what is effective.

yurivict commented 7 years ago

I would still go with full-on pseudo-randomness. They will eventually detect Tox if it sends something unencrypted. This isn't difficult at all.

cryptlib https://www.cs.auckland.ac.nz/~pgut001/cryptlib supports ElGamal pubkey algorithm which can be used to send the symmetric encryption key. It also supports RSA pubkey algorithm.

iphydf commented 7 years ago

I agree it's not difficult, but it will take a few person-hours* to implement, and another few person-hours to review and perhaps iron out bugs. We then need to wait for all clients to be running on the new protocol before we can start deprecating the old one. All of this is quite a lot of work even though it's pretty straightforward in theory. We have very scarce resources, and I think right now fixing this protocol issue and going through the whole protocol change workflow is not worth the time when we have another more complete protocol change coming up in a few months time. Unless we have evidence that this fix will actually help users right now, I'm reluctant to spend time on it at the moment.

* A few person hours can already mean a full week in real time. Remember we're all volunteers doing this in our spare time next to full time jobs. Realistically, I think this change would cost at least 2-3 real time weeks to implement.

yurivict commented 7 years ago

With this feature Tox will be able to boast that it is the only truly unblockable IM. No other IM has this feature. It will probably gain popularity in repressive countries like China.

yurivict commented 7 years ago

Does anybody object to using libgcrypt for the public key encryption? I looked into it, libgcrypt does what is needed and is quite popular and its interface looks nice overall.

I will implement this feature, and will submit the pull request when done.

yurivict commented 7 years ago

Now, after giving it more thought, I think it is possible to do this entirely with libsodum. Temporary key pair should be generated on the fly by the sender of the DHT message. The temporary, random public key should be attached to the first message, so that the receiver can decrypt it. The stream will look completely random since the receiver's private key isn't known.

GrayHatter commented 7 years ago

@yurivict yeah, that's the idea someone (iphy I think) came up with in #toktok. I was opposed because that could become very cycle expensive VERY fast. It's not unworkable, I'm just not convinced it's the best idea.

yurivict commented 7 years ago

Public key cryptography is more expensive. This is why only the first message should be done this way. Symmetric key should be sent it the first message, and it should be used afterwards.

Involving any other crypto library will have the same problem.

irungentoo commented 7 years ago

To block Tox the easiest way is to just block any port UDP connection and drop any TCP connections that try to contact a DHT bootstrap node.

If you make the traffic look random Tox would be pretty much the only protocol to do that which would make it easy to block.

A better way would be to make the traffic look like another protocol like HTTP or whatever.

emdee-is commented 1 year ago

I don't think indistinguishable message bytes and sizes are going to make any difference when they are sent over a network of so few bootstrap nodes. And anyone operating in a hostile environment is probably already running Tox over Tor, which works well.

Did the reports of blocking in China include blocking over Tor? Tor itself has implemented pluggable transport mechanisms that are continuously being improved, and these include ways to make the traffic look like another protocol like HTTP or whatever: https://snowflake.torproject.org/

So the best way to handle this may be to improve the documentation in Tox of how to use Tor, and how to configure Tor to use pluggable transports. They are the only things that work in e.g. Egypt or Iran.

It requires fixing

https://github.com/TokTok/c-toxcore/issues/469

To test it we need an equivalent to the other/fun/bootstrap_node_info.py script for TCP connections:

https://github.com/TokTok/c-toxcore/issues/2331

We should not kid ourselves that we don't all live in China - we're all in a loc$down now.

TokTok / c-toxcore

Message bytes and sizes sent over the network should be indistinguishable from the random stream #419