vanhauser-thc commented 10 months ago

Risk: Low

Summary

An attacker that has the fastest time keeper can use a time compression attack to force the win of a solution his nodes made.

Issue Details

Assumption: an attacker runs a private timekeeper that is faster (e.g. twice as fast) as the the fastest public timekeeper on Subspace.

Due to the faster PoT generation the attacker's nodes can check for solutions earlier than others on the network. When one of his nodes have a PoS solution to a future slot, the attacker can force that this solution is the only one being considered by performing a time compression attack:

Once the slot is near (e.g. 10 slots away for which the solution is for), the held back PoT messages are gossiped to the network at a fast speed, e.g. each next slot PoT being sent every 250ms, then sending the PoS solution for the slot when it is time, and continuing to send PoT messages fast until 10 slots after the solution slot.

This will result in nodes not having the necessary time to look if they have a solution of their own, and when they find that they do, the chain has already advanced too much for them to publish, and the attacker's solution being the only present.

Risk

An attacker that owns the fastest timekeeper can force that his PoS solutions win against others on the blockchain.

Mitigation

Only if the fastest timekeepers are public this risk can be mitigated.

A detection control could also be deployed: a watchdog that analyses the gossiped PoT messages, and when a burst of future PoT slot messages are detected to be aware that this attack was performed. Then governmental countermeasures could be performed or a project for a faster timekeeper ASIC be started.

The risk is low and likely an attacker would gain more by buying more disk space and making it available.

nazar-pc commented 10 months ago

This might be a bit worse that it looks initially.

Attacker here can not only win the competition in case of a particular solution they have, they can "wipe out" other potentially valid solutions in the range of slots between the slot at the tip of the chain and whatever future slot they can produce.

So if future proof included in the best block is 1000 and an attacker can produce a block at slot 1100, all other solutions on the network for slots in the range of 1001..=1099 are thrown away.

Not only this makes regular participants not get rewards, this also leads to solution range adjustment that indicates less space on the network is present than there actually is (more slots between blocks). After a few iterations as solution range gets wider an attacker would be able to win slots with higher and higher probability using the same amount of space.

nazar-pc commented 10 months ago

This is partially mitigated byentropy injection from the blockchain, so they can't create proofs infinitely into the future, but still it seems like with less than 50% of storage they can influence things significantly.

vanhauser-thc commented 10 months ago

yes I had the entropy injection in mind, that is why I set this to low. It depends on how much faster the attacker's timekeeper is; the faster the earlier after an entropy update it can be performed. Once subspace has ASIC based timekeepers deployed I think the "is faster" will be not x2, but until then, if someone builds something himself/herself at a reasonable cost ... (that is another thing I do not find very likely but I am no expert there)

dariolina commented 9 months ago

This is part of a more general issue of a faster timekeeper posing a danger to the protocol. The issue is serious, but I agree that the probability of pulling off this attack seems low. Unfortunately, there is not much we can do about this and #2141 except always making sure the fastest available timekeeper is deployed to the protocol honestly. On the network level, we can deploy a watchdog to monitor real cadence of PoT production and alert of possible advantage gained by some timekeepers. Over the longer term we should also invest in an ASIC research and possibly develop and distribute them over to the community.

vanhauser-thc commented 9 months ago

We were brainstorming about this a bit more in our audit team, maybe a different PoT solution would be beneficial. Monero uses the RandomX approach to prevent that ASIC can be efficiently used. If you would deploy this or something similar, that would:

prevent making attackers any gains by using an ASIC for PoT
the CPU speed improvement is very limited every year, basically negating this attack completely
save Subspace money by not needing to invest in ASIC

With the same checkpoint approach this is also easier to verify in parallel for any node.

The CPU load would be higher compared to the current AES solution, but as this impacts only one single core, I think this is neglectable.

dariolina commented 9 months ago

@vanhauser-thc Thank you for the suggestion! I was reading on it and noticed that one of the claims of RandomX ASIC-resistance is actually the same claim as why we believe an ASIC would not have significant gains for Subspace PoT too (from https://blog.trailofbits.com/2019/07/02/state/):

This generator uses AES-NI instructions to fill the Scratchpad. Generation of the initial Scratchpad uses AES transformations. This algorithm is already hardware-accelerated on modern CPUs, so an ASIC will gain no advantage implementing it.

Preliminary studies we did with Supranational already put a theoretical estimate on max 2-3x gain on AES speed from an ASIC, but it is not clear they are practically achievable.

nazar-pc commented 9 months ago

It is not immediately obvious how RandomX would be better than AES with AES-NI directly. We already know that it is indeed possible to achieve lower AES latency in ASIC (we are proving time here after all) and we know the bounds of the potential speed-up.

vanhauser-thc commented 9 months ago

The advantage is that you would not need ASIC, and an ASIC would not even be possible. Elminating cost for your and pretty much eliminating this attack surface for the subspace blockchain. CPU speed improvements every year are lower than AES speed differences in CPU vs. ASIC (which you guys predict to be x2 to x4).

dariolina commented 9 months ago

Do I understand correctly that RandomX is also parallelizable (over multiple cores, not just instruction-level)? If so it can't be suitable for a VDF use-case

nazar-pc commented 8 months ago

@dariolina the question to our research team is whether RandomX is actually better than raw AES-NI, because if it is, then we'll not have to invest into an ASIC. I suspect it may not be parallelizable internally, but multiple instances can run on different cores of course, but I didn't check it.

I think we need to check that and either close this issue or turn into actionable upgrade plan for PoT.

fengchen06 commented 8 months ago

We do see some advantages of RandomX over AES because of its use of memory-hard functions. However, this doesn't come for free: it seems to increase the memory usage (for both evaluation and verification). Also, the hardware speedup for AES is quite limited according to Supranational. Therefore, we do not recommend this change, at least for now.

On the other hand, we are interested in the following questions: Is there any existing ASIC for AES speedup? If yes, how much is it?

nazar-pc commented 8 months ago

Is there any existing ASIC for AES speedup? If yes, how much is it?

There are many, but they are focused on high throughput, while we need low latency for this use case. I wouldn't expect any of them having lower latency than AES-NI. Theoretical ASIC speed-up was already estimated by Supranational.

nazar-pc commented 3 months ago

@dariolina do you think we can close this?

vanhauser-thc commented 3 months ago

If you want to close this you should put it into a risk/security document that is available for users that documents the assumptions and known risks of the blockchain.

dariolina commented 2 months ago

@vanhauser-thc I have a paragraph on this in the public subnomicon (point no.3)

dariolina commented 2 months ago

Closing as an inherent risk that we don't have ways to directly address right now

autonomys / subspace

[Low] Time compression attack #2193

Summary

Issue Details

Risk

Mitigation