ieee802154_security: Nonce is reused after reboot

chrysn commented 3 years ago

Description

The frame counter used with ieee802154_security is initialized with 0 at startup. While it is protected against overflow, it is not protected against being reset, and that reset happens whenever the device restarts.

As the key is flashed into the device in ieee802154_security's normal operation, and the sender LL address is constant per device, the same nonce (varying only through the resetting frame counter) is used in the AES encryption multiple times. Reuse of the same (nonce, key) breaks confidentiality guarantees.

(AES-CCM is used here, so AIU it's not as bad as if GCM were used, when there'd be key leakage).

Steps to reproduce the issue

(All done on microbit-v2; I have high confidence in this working on any 802.15.4 encryption capable device).

Sniff for packages, eg. by building the default module.
Build the gcoap example with USEMODULE+=ieee802154_security
Send out a GET request to the sniffer module, path /hello_world/coap
Repeat the request a few times (to cancel any jitter in the number of messages sent during startup)
Reboot the device, eg. by power cycling it
Send out GET requests to the same address, path /.well-known/core

Expected results

Requests after the reboot use different sequence numbers.

Actual results

Requests after the reboot start from the same zero sequence number again.
Requests have byte-wise identical requests in regions of equal content, eg. (asterisks mine)

~~ SNIP  0 - size:  45 byte, type: NETTYPE_UNDEF (0)
00000000  06  05  00  00  00  9B  51  C4  68  F6  D4  E9  4D  96 *89  5E*
00000010  1F  F2  60  BC *95* 33  80  F2 *20* FE  90  EE  78  4A  D5  74
00000020 *39  91  74* F6  84  6E  DE  C4  CC  3C  05  7A  60

~~ SNIP  0 - size:  45 byte, type: NETTYPE_UNDEF (0)
00000000  06  05  00  00  00  9B  51  C4  68  F6  D4  E9  7A  AB *89  5E*
00000010  0A  D1  65  84 *95* 75  92  FB *20* FD  E2  F2  79  57  CE  7E
00000020 *39  91  74* E5  91  6F  6B  A7  FE  CE  0F  A9  14

(Note the 20 in the second row where the shared "l" of "hello" and "well" is, as well as the "\x04co" of the "core" / "coap" option; variation is in MIDs (first row bytes 12-13), token (second row, first 4 bytes) and the diverting texts).

Versions and cross-references

All since introduction in 2021.01 until current HEAD.

Since #16841, the module in question has been marked as experimental.

Disclosing this has been discussed in the closed security list, and was deemed responsible given the overall circumstances.

CVE-2021-41061 has been assigned to this issue.

Road forward

This is not trivial to fix, as we don't have any committed persistence inside generic devices, and even with 6TiSCH minimal security the problem is just shifted (for section 4.6 requires monotony of ASNs on the device which is equivalent to this problem, although it'd shift the attack difficulty to an active replay of old beacons). Likewise, most advanced modes need persistence, until (with ace-ake-authz) asymmetric negotiation comes into play.

Off my head I don't know any standard solutions that can do with neither asymmetric cryptography nor local persistence; some randomness based scheme could possibly be deployed but it'd be very ad-hoc, custom and eventually not easier than the existing solutions.

I think that the discussion in #16730 can serve as a starting point.

miri64 commented 3 years ago

Could SRAM PUF help here at least with the nounce generation?

chrysn commented 3 years ago

I don't see a full story from it, but maybe ...

If we went for an entropy approach, its nose component could help to make sure there's enough of it. (At least in cold starts; do we have something to carry over in warm starts?). But no matter how good our source of randomness after a million messages with many reboots inbetween (5 byte = 40 bit ~= 1000*4, at sqrt(n) we have good birthday chances) it still becomes likely. No standard says that these can* be random, but outside 6TiSCH I don't think any implementation has hard requirements on the numbers being ascending either (I'm unaware of any monotony based replay protection, but then again I don't know many 6lo implementations in the first place).

The PUF itself, as I understand it, would only give us more data that doesn't change across reboots (like the key and the LL address already do).

Hm ... I wouldn't see random initialization at startup as a solution to the whole problem, but given that the full solution will need long-term work, it might be a band-aid worth putting on.

(Also, running the numbers made me look into flash cycles -- on nrf52 with a 2-page persistence area we get 10k erase cycles 128 writable words 2 pages ~= 2 million reboots; I hope to optimize soft reboots so that only hard reboots need flashing. So the randomness approach is probably even practical, but still I wouldn't want to declare it as "just use it that way").

[edit: fixed cycle numbers]

fabian18 commented 3 years ago

So with SRAM PUF we could start with a random frame counter? As far as I understand the problem, this does not really solve it, because the frame counter sequences of two boot up sessions could still overlap and if the second sequence starts at a lower counter than the first session finished with, the receiving device would identify the frames of the sender as replays. Given that the receiving device implements replay protection which is currently not the case for RIOT because it requires more memory and leads to more complexity.

As far as I understand we would need persistent storage to solve the issue. @maribu suggested to use backup_ram, which I think is very promising, but needs hardware features. I coded something on a local branch, but I am not quite satisfied yet with it. The frame counter starts at 0 on cold boot. On reset and flash it continues where it ended with. To also not start at 0 on cold boot, we would probably need en external EEPROM.

chrysn commented 3 years ago

We could start with a random frame counter with or without the SRAM PUF -- but the SRAM PUF gives better randomness at a cold start. (How good randomness we have generally depends on hardware support; the SRAM PUF can provide some in those cases where there is no good hardware entropy source).

As for backup_ram, what are we talking about here?

Just a memory section that's not cleared on soft reboots? (That won't solve the general issue unless you're OK with the device working only if it's continuously powered from flashing time -- but it can eliminate both wear and sequence number waste in warm reboots, even through hardfaults etc.)
Low-power memory that survives sleep states but is best-effort when the device is unplugged? (Same, but gives the benefits also across the full deep-sleep cycles).
Actually long-term retained memory? That'd be great and yes we can probably manage with this. (But who has something like this, and how is it not just EEPROM?)

In general, the requirements are:

Persistence: Once data is lost, the device can't rejoin the network w/o rekeying (or the device changing its LL address).
Persistence indication: If data that was once successfully written there is lost, it must be clearly indicated to the reader that data was lost.
Atomicity: If there is a write hole, then power loss at the point of writing would break persistence.

If something can't provide atomicity, it's not too bad -- that can be solved in software using ordered writes and some primitive journaling. If persistence indication is not provided, checksums can fill in.

Populating anything other than flash might during flashing could be tricky (I guess the words give it away). There are tricks that can be done with flashing (eg. we could flash keys and a 0xffffffff first-start indicator, and on first start the device clears the persistent-RAM, starts counting there, and invalidates the first-start indicator), but the clearer approach is then probably to flash a key-less (or otherwise invalid) configuration and then populate the persistent-RAM through the debugger, automated through the console or whatever. -- But all these apply only if there is viable long-term backupped RAM available in the first place.

fabian18 commented 3 years ago

As for backup_ram, what are we talking about here?

Just a memory section that's not cleared on soft reboots? (That won't solve the general issue unless you're OK with the device working only if it's continuously powered from flashing time -- but it can eliminate both wear and sequence number waste in warm reboots, even through hardfaults etc.)

Exactly!

Low-power memory that survives sleep states but is best-effort when the device is unplugged? (Same, but gives the benefits also across the full deep-sleep cycles).

It depends ... See macro CPU_BACKUP_RAM_NOT_RETAINED. Some CPUs do retain data during sleep mode e.g. the stm32´s have a dedicated regulator for that. But others saml21, according to the definition of CPU_BACKUP_RAM_NOT_RETAINED = 1 do not.

Actually long-term retained memory? That'd be great and yes we can probably manage with this. (But who has something like this, and how is it not just EEPROM?

Sorry the "external" was a bit misleading. It does not matter if we use peripheral or external EEPROM. I know that at least some AVRs (mega2560) have internal EEPROM. And RIOT´s external EPROMS are at24cxxx and at25xxx.

chrysn commented 3 years ago

CPU_BACKUP_RAM_NOT_RETAINED It does not matter if we use peripheral or external EEPROM

Can we create (or, in light of NIH discussions, find and evaluate) an abstraction for the needed persistence that is backed by whatever is there (or is configured; "device breaks if battery backup ever fails" can be OK for some cases)?

(That'd need a full story for the "non-flash data at flash time" problem, but so what).

benpicco commented 3 years ago

There is also periph_rtc_mem (#16758, #16802) where some RTCs allow you to store a few bytes that are kept during deep sleep.

"device breaks if battery backup ever fails" can be OK for some cases

I'm not sure why it has to break. The device was at some point in a state where it had not joined the network yet. Why can't it simply return to that state if it lost it's semi-persistent configuration?

"device needs to be paired again if battery backup ever fails" sounds much better to me.

chrysn commented 3 years ago

Good point, rtc_mem could be another backend.

I'm not sure why it has to break.

Because the device can't join the network again even with re-pairing. A repairing (no pun intended) that makes the device joinable would involve rekeying the full network, at which point we're at the complexity levels of CoJP again.

But to avoid that breakage ...

What we could probably (warning: again we're in unspecified territory) do as a slightly better band-aid is working with absolute time. Quite like the ASNs of 6tisch but less synchronized: If the devices can guarantee that they'll send at most, say 1000 messages per second (which might be realistic from the radio properties), then the key on the PC would come with an absolute timestamp of when it was created, and what's the message rate (is 1000msg/s are a bound already enforced by the physical layer?). Then any device can be rejoined without rekeying if its sequence number is set to the current time in the relevant scale. There's no need for the device to keep monotonous time, just to count -- as long as it adheres to the data rate, its highest sent seqno will always be less than the current time.

It'd be still worse than the full 6TiSCH solution, but a key could still live for like 30 years on the usable 5 byte sequence numbers. (Worse because

there's still the issue of getting the persisted state into the device,
we consume sequence numbers faster than ASNs to be pessimistic about clock differences and that stuff, and
unlike in 6TiSCH we have to send these 5 bytes with every message

but still)

With that, the remaining limitation would be that once persistence is lost, the device needs to obtain a trusted time again in the key's time scale.

chrysn commented 1 year ago

Reading RFC9030 (yes that's 6TiSCH, but CoJP=RFC9031 doesn't reiterate everything that's there), there's an alternative to keeping sequence numbers locally and rekyeing: It says at the end of this section that the nonce is not necessarily built from the MAC address, but can also be built from the short identifier.

That still leaves us with the problem that the device has to know which short identifier it's supposed to be using, which again means we'd need CoJP, but at least this doesn't need extra information to the CoJP (that'd tell it the MAC address), and the CoJP can get away without rekeying the network for as long as it is not running out of short identifers. CoJP when used with EDHOC doesn't need any local persistence, so it's easy again.

It may be worth pointing out that there are two kinds of persistence we're having here:

Persistence of counters associated with one key: This is something the application can solve (although I'd really like to have a tool to hand the application as an OS).
Persistence of counters for a key that may be used across a reflash with a completely different example / program. This is what we don't even have a plan for solving -- but using any variant of CoJP (even the OSCORE-only one with the type-1 persistence issue) in combination with short identifiers that the JRC guarantees are not reused on a key gets us around the problem (and I think that's good enough).

chrysn commented 1 year ago

Truth be told, I'm not sure where 802.15.4 says short addresses are legal; investigating -- but appreciating help there (@mcr, maybe?).

RIOT-OS / RIOT