bittorrent / bittorrent.org

393 stars 101 forks source link

Transitioning to stronger hash function #58

Closed the8472 closed 7 years ago

the8472 commented 7 years ago

With the published collision attack on SHA1 bittorrent may become less attractive for use-cases which don't just want simple integrity-checking (for which it still is perfectly fine) but also authentication.

We should come up with a plan to transition to stronger hashes for all uses, ideally while maintaining backwards compatibility for a transition period.

Thoughts:

bramcohen commented 7 years ago

Reducing leaf hash sizes doesn't help there unless you also reduce the piece size. Clients do not have access to the leaf hashes until after they have verified a whole piece and can derive them from the data itself.

Huh? The whole point of going Merkle is that you can verify the very first chunk you get, even when all you've got is the root.

We're already doing a fairly big redesign of the metadata, there's no need to touch the wire protocol.

We need to add hash proofs if nothing else

It still makes sense to request smaller blocks for the tail piece, which is done today. With more tail pieces, which the proposed hash tree would create, it becomes even more useful to be able to request smaller blocks.

What's wrong with just truncating the last piece? Certainly the last chunk should be assumed to be truncated, which makes indices work fine.

arvidn commented 7 years ago

What's wrong with just truncating the last piece? Certainly the last chunk should be assumed to be truncated, which makes indices work fine.

That's a fair point. I have a hard time coming up with any other reasons for unaligned or oddly sized requests. However, I think @the8472 has a reasonable point though, that adding a new PIECE message can be done independently, and deferring it probably eases the transition

the8472 commented 7 years ago

I think the draft now contains all necessary components for an upgrade plus some of the "always wanted to do that" changes discussed. General discussion and nitpicks are welcome, nothing is set in stone.

the8472 commented 7 years ago

@bramcohen

Huh? The whole point of going Merkle is that you can verify the very first chunk you get, even when all you've got is the root.

That is one possible use of merkle trees. But as specified by BEP30 it is in conflict with other use-cases (discussed in #29). The new draft retains all the old possibilities provided by piece lists, vastly simplifies deduplication. This makes .torrent files including the root dictionary about about as heavy-weight as info dictionaries were before. But magnets/metadata exchange can now provide in a faster startup by grabbing piece level hashes on demand. If that is not sufficient for some low-latency use-cases then we can, as @arvidn suggested add another extension in the future to provide wire protocol messages that extend hash information down from the root or the piece layer to the block layer. (Note: if piece length == 16KiB then piece layer == block layer.)

So with some extensions can get all the goodies of merkle torrents if we want them, but the draft for the new base protocol aims to cover all the use cases enabled by the old base protocol (and then some).

kyhwana commented 7 years ago

See also https://biterrant.io/ for a Proof of Concept attack involving good/malicious binaries.

bramcohen commented 7 years ago

There's another old idea which we have the opportunity to add now: Instead of peers presenting the infohash directly, they present a hash of the hash and use the actual infohash as a shared secret for kicking off an encrypted connection so that peers who don't have access to the infodict can't download the file. This relates to the general subject of how radical we want to be and how many stages we want to do this in. What I'm going to do now is read through the detailed proposal and critique it primarily from the point of view of forming a rollout plan.

the8472 commented 7 years ago

Well, the attempt at peer protocol encryption for the purpose of traffic shaping evasion kind of failed after working for a short time. Failed in the sense that it is an arms race and we did not continue to refine it to keep winning that race.

But there are numerous other security properties that can be achieved with encrypted peer connections. But I think some of them can be better achieved with payload encryption (#18, #20), but I'll have to revise that proposal in light of a new torrent format.

arvidn commented 7 years ago

@bramcohen When using protocol encryption today, peers don't advertise the real info-hash, but a hash derived from it. Primarily what I think would be required to achieve a bit more privacy would be to consistently advertise such derived hash to the DHT, trackers and local peer discovery (and also to reject un-encrypted connections)

the8472 commented 7 years ago

That conflicts with making public content easily accessible. Many search sites depend on or supplement their data with DHT-indexed torrents and thus provide a valuable service to users. Of course privacy is also desired by users - many probably would prefer to transfer their data anonymously if that could be had cheaply - but for public content anyone can obtain the torrents, join a swarm and PEX a peer list, so there are limits on what is practically achievable.

Of course there are other privacy aspects. For example passive ISP-level/public wifi surveillance. This could be prevented by unsigned DH-exchanges on the protocol level and for DHT stores. And then there's private data that's not meant for public distribution. In that case payload encryption could solve that issue and also provide at-rest protection for the data. The other measures wouldn't hurt since it wouldn't make sense to have that indexed anyway. Another aspect is censorship of specific infohashes on the network level. I don't recall this happening anywhere except trackers blacklisting particular infohashes in response to DMCAs, but one possible countermeasure there is infohash-hopping with some storage-intensive derivation algorithm.

Anyway, my point is we can cover multiple use-cases at the same time but that requires discussion about several distinct threat models and countermeasures. I think that would be better served by a separate discussion.

zookozcash commented 7 years ago

Here was my argument that BLAKE2 was safer than SHA-256 for use inside one part of Zcash:

https://github.com/zcash/zips/issues/36#issuecomment-204603515

Skip the other argument also in that thread about "provable security" and read the parts about BLAKE2 vs. SHA-256. In short, BLAKE2 has a bigger security margin and has been at least as well-studied if not better-studied than SHA-256. OTOH, I feel pretty confident about SHA-256. I wouldn't warn you against SHA-256 for BitTorrent, I'm feel pretty sure SHA-256 will be fine. But when asked to make judgments for Zcash, which is my baby and my love, my paranoia kicks in and I think SHA-256 isn't good enough, and we need something even safer.

That argument (especially considering some details such as Samuel Neves posted: https://github.com/zcash/zips/issues/36#issuecomment-204111300) persuaded Daira Hopwood to use BLAKE2 for that application.

On Sun, Mar 12, 2017 at 11:37 AM, the8472 notifications@github.com wrote:

That conflicts with making public content easily accessible. Many search sites depend on or supplement their data with DHT-indexed torrents and thus provide a valuable service to users. Of course privacy is also desired by users - many probably would prefer to transfer their data anonymously if that could be had cheaply - but for public content anyone can obtain the torrents, join a swarm and PEX a peer list, so there are limits on what is practically achievable.

Of course there are other privacy aspects. For example passive ISP-level/public wifi surveillance. This could be prevented by unsigned DH-exchanges on the protocol level and for DHT stores. And then there's private data that's not meant for public distribution. In that case payload encryption could solve that issue and also provide at-rest protection for the data. The other measures wouldn't hurt since it wouldn't make sense to have that indexed anyway. Another aspect is censorship of specific infohashes on the network level. I don't recall this happening anywhere except trackers blacklisting particular infohashes in response to DMCAs, but one possible countermeasure there is infohash-hopping with some storage-intensive derivation algorithm.

Anyway, my point is we can cover multiple use-cases at the same time but that requires discussion about several distinct threat models and countermeasures. I think that would be better served by a separate discussion.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bittorrent/bittorrent.org/issues/58#issuecomment-285964476, or mute the thread https://github.com/notifications/unsubscribe-auth/APp9QGnvXCjq7QY2A4x01qkAVveJ4M4hks5rlDuEgaJpZM4MKBbT .

sigmarelax commented 7 years ago

If "good until the end of time" is your goal as you do not want another transition in x years, I don't believe you should use SHA-256.

While not weakened at full-rounds, there have been some notable attacks found, partial list over at wikipedia. Additionally, the US government has been moving secure systems off of SHA-256 since last year, in favor of SHA-384.

It doesn't seem logical to me to implement SHA-256 for the long-term when there are alternatives (particularly one that is well-vetted, faster, has a better security margin).

the8472 commented 7 years ago

I for one am not opposed to stronger hash functions. The question is whether the other devs will agree to them.

Impact:

ssiloti commented 7 years ago

I'm going to go out on a limb and say that 128 bits of security should be enough for anyone. It is enough to put brute force attacks firmly in the realm of boiling the oceans. Even if SHA-256 were subject to a break as severe as SHA-1, losing ~20 bits of security, it would still be comfortably secure against practical attacks with conventional computers. It's going to take a radical new attack to find a SHA-256 collision, and such an attack may well render SHA-384 vulnerable as well.

Note that while SHA-512 and its truncated derivatives are faster in software on 64-bit processors, such CPUs are increasingly likely to have even faster hardware support for SHA-256. Software performance is more likely to matter on embedded systems with 32-bit CPUs where SHA-256 is faster.

Given the trade-offs, I'm willing to take a chance with SHA-256.

the8472 commented 7 years ago

it would still be comfortably secure against practical attacks with conventional computers

But if we are debating "until the end of time" security then we have to take quantum computers into account. The BHT algorithm claims to reduce collision resistance to 1/3rd of the hash length instead of 1/2. DJB refutes this, but he leaves the question whether that is the final word on the issue open.

the8472 commented 7 years ago

Software performance is more likely to matter on embedded systems with 32-bit CPUs where SHA-256 is faster.

Well, 64bit systems are making a foray into the embedded landscape, e.g. the raspi3.

But looking at some openssl benchmarks it looks like most systems are very anemic anyway and the difference between sha256 and sha512 is a factor of 2-3 at most. So if sha512 is prohibitively expensive then sha256 won't fare much better.

Do we have a case where every ounce of performance matters? I don't really want to be responsible for a "256bits ought to be enough for everyone" a few years down the road if we can avoid it.

ssiloti commented 7 years ago

I don't think we have a clear enough picture at this point to predict which functions will prevail in a post-quantum world. Practical quantum computing is still a long ways off and the economics of it are totally unknown, so a wait-and-see approach seems appropriate.

Regarding performance. ARMv8 has an optional crypto extension which supports SHA-256, so 64 bit embedded devices at least potentially carry hardware support. To be fair it looks like the RPi3 lacks that extension, but it also lacks enough I/O throughput for it to matter.

The main reason to care about performance is power consumption. Reducing power draw by a factor of 2 or 3 is a big deal in mobile and datacenter settings. It also makes it easier to run a client on a cheap VPS as they tend to be heavily oversubscribed for CPU time and less so for bandwidth.

the8472 commented 7 years ago

I guess we'll see in a decade or two. Anyway, I'm mostly waiting on review from @bramcohen. Once any differences are settled we should also generate some test-data and add them to the spec. Due to the merkle trees the complexity is a little higher, so hopefully we can avoid implementation errors that way.

zookozcash commented 7 years ago

Here's a better link to my arguments for why BLAKE2 is safer than SHA2:

https://github.com/zcash/zcash/issues/706#issuecomment-187807410

Note in particular the security margin bit — SHA-256 is breakable up to 31 out of 64 rounds. BLAKE2b is breakable up to 2 & ½ of 12 rounds (https://en.wikipedia.org/wiki/Hash_function_security_summary). Note also the "SHA-256's father has turned his back on it" argument, which I present not as an argument that SHA-256 is weak, but that SHA-256 will suffer a worse and worse reputation as time goes by.

the8472 commented 7 years ago

Closing this as the BEP has been added as draft.

Feedback is still welcome from implementors, especially if they encounter some roadblocks. At the moment it still is a draft after all.