This repository contains the code for the experimental CHERIoT network stack. This is an early work-in-progress implementation that has been made public to facilitate easier collaboration.
WARNING: This is not fully hardened against our desired threat model. It should not be used in production. Note especially that there is no strong entropy source on the Arty A7 FPGA prototyping platform and so any TLS connection is moderately easy to compromise. It is expected to approach production quality in 2024H2.
The network stack includes components from a variety of third parties:
This demonstrates the CHERIoT platform's ability to adopt existing codebases. We are building around 100 KLoC of third-party code into this stack. This is mature and well-tested code that we have no desire to rewrite. A side-channel-resistant TLS implementation, for example, would be a huge undertaking.
Our additions add security boundaries around these existing components and less-general APIs that are tailored for our expected use cases. The new code in this repository is around 3% of the size of the large components that we are reusing with no code changes.
The initial implementation has six compartments. Four are mostly existing third-party code with thin wrappers:
These are joined by two new compartments:
The communication is (roughly) summarised below:
graph TD
Network
subgraph Firewall["On-device firewall"]
DeviceDriver["Device Driver"]
end
TCPIP["TCP/IP"]:::ThirdParty
User["User Code "]
NetAPI["Network API"]
SNTP:::ThirdParty
TLS:::ThirdParty
MQTT:::ThirdParty
DeviceDriver <-- "Network traffic" --> Network
TCPIP <-- "Send and receive Ethernet frames" --> Firewall
NetAPI -- "Add and remove rules" --> Firewall
TLS -- "Request network connections" --> NetAPI
TLS -- "Send and receive" --> TCPIP
NetAPI -- "Create connections and perform DNS requests" --> TCPIP
MQTT -- "Create TLS connections and exchange data" --> TLS
User -- "Create connections to MQTT server and publish / subscribe" --> MQTT
MQTT -- "Callbacks for acknowledgements and subscription notifications" --> User
SNTP -- "Create UDP socket, authorise endpoints" --> NetAPI
SNTP -- "Send and receive SNTP (UDP) packets" --> TCPIP
TLS -- "Request wall-clock time for certificate checks" --> SNTP
style User fill: #5b5
classDef ThirdParty fill: #e44
The TCP/IP stack is a large compartment with a lot of state. It is fault-tolerant: when an error is triggered (CHERI spatial or temporal safety fault, assertion), the compartment is automatically reset to a pristine state and restarted. We expand on this capability below.
Unlike the TCP/IP stack, the TLS compartment is almost completely stateless. This makes resetting the compartment trivial, and gives strong flow isolation properties: Even if an attacker compromises the TLS compartment by sending malicious data over one connection that triggers a bug in BearSSL (unlikely), it is extraordinarily difficult for them to interfere with any other TLS connection.
Similarly, the firewall is controlled by the Network API compartment. The TCP/IP stack has no access to the control-plane interface for the compartment. A compromise that gets arbitrary-code execution in the network stack cannot open new firewall holes (to join a DDoS botnet such as Mirai, for example). Note that there are currently some technical limitations to this, see the discussion below. The worst it can do to rest of the system is provide malicious data, but a system using TLS will have HMACs on received messages and so this is no worse than a malicious packet being injected from the network.
All of this is on top of the spatial and temporal safety properties that the CHERIoT platform provides at a base level.
Note on the isolation of the firewall control plane.
In the current implementation, the TCP/IP stack still indirectly controls which endpoints the firewall allows/disables because the Network API compartment operates with domain names, and the firewall with IPs, and the TCP/IP stack controls the translation between the two.
For instance, if the application tells the Network API that the only endpoint it will ever communicate with is example.com
, the Network API will need to translate that domain name into an IP to create a firewall entry.
The TCP/IP compartment is responsible for doing the translation through DNS.
Thus, a compromised TCP/IP stack can spoof the DNS translation and return whichever IP address it wants to connect to, to create a corresponding firewall entry.
This attack scenario comes with the limitation that DNS resolution and firewall updates only happen when establishing a new connection. However, in many cases the TCP/IP stack can trigger this arbitrarily by closing the sockets opened by the application to force the application to trigger another socket open (and thus to re-establish the connection and re-translate the domain name).
Looking forward, we are planning to address this limitation by moving the DNS lookup to a separate compartment. Unfortunately, without DNSSEC, the network stack can still tamper with responses, so this will also require a firewall-layer bypass to send DNS responses to the DNS compartment instead of the network stack.
CHERI systems use capabilities to authorise memory accesses. CHERIoT provides abstractions for software-defined capabilities that can authorise different operations. These are represented in the hardware via sealed (CHERI) capabilities that refer to specific kinds of objects. These sealed capabilities can be treated as opaque tokens (and cannot be directly used) by most code but can be unsealed by the compartment that owns the corresponding unsealing capability.
The network stack uses three kinds of sealed capabilities:
The flow for establishing a network connection is as follows:
At the end of this, the original caller can directly call the send and receive functions in the TCP/IP stack to send and receive data.
Each compartment needs to share some data with the others. For UDP, the receive path can be entirely zero copy (after the packet leaves the driver). A UDP packet arrives and is copied from the network interface into a new allocation. This is processed by the TCP/IP stack and then claimed with the allocator capability passed the receive-message call, freed with the network stack's allocator capability, and returned. This ensures that the packet is freed once the caller frees it, transferring ownership out to the caller. Callers worried about time-of-check-to-time-of-use attacks from a compromised TCP/IP compartment may need to defensively copy.
BearSSL maintains its own send and receive buffers. The TCP/IP stack can copy directly to and from these, as long as we can make this secure. These are passed from the TLS compartment to the TCP/IP compartment as bounded capabilities with only load or store permissions. This means that the TCP/IP stack cannot access out of bounds and cannot capture the pointer. In the case of a store (read from the network), the TCP/IP compartment also cannot read stale data from the buffer (for example, it cannot read previously decrypted data).
The network stack relies on some interfaces being restricted to certain compartments. For example, there are some APIs in the TCP/IP and Firewall compartments that should be exposed only to the Network API compartment. These can be checked by the cheriot-audit tool, with the aid of the policy in this repository.
The network_stack.rego
file also makes it easy to extract connection capabilities.
For example, if you run the following Rego query against the HTTPS example (after loading network_stack.rego
with -m
):
data.network_stack.all_connection_capabilities
You should see the following output (piped through jq
for pretty printing):
[
{
"capability": {
"connection_type": "UDP",
"host": "pool.ntp.org",
"port": 123
},
"owner": "SNTP"
},
{
"capability": {
"connection_type": "TCP",
"host": "example.com",
"port": 443
},
"owner": "https_example"
}
]
This tells you that the SNTP compartment has a capability that allows it to create a UDP socket and communicate with pool.ntp.org and that the https_example
compartment can make TCP connections to example.com:443.
No other compartments can make connections and no compartment may communicate with hosts not on this list.
This can feed into more auditing infrastructure.
Do you want to check that all TLS connections are encrypted? Try asking which compartments are calling the TCP connection function:
data.compartment.compartments_calling_export_matching("NetAPI", `network_socket_connect_tcp(.*)`)
Hopefully the output is very short:
["TLS"]
This means that all TCP connections are made via the TLS compartment and, unless the TLS compartment is compromised, no traffic can flow over TCP that is not encrypted. Unfortunately, SNTP is unencrypted. It can have verified signatures (which you absolutely should use in a real deployment: the current prototype just talks to pool.ntp.org without authentication) though. This should be the only thing used with UDP:
data.compartment.compartments_calling_export_matching("NetAPI", `network_socket_udp(.*)`)
["SNTP"]
Now that you know that the SNTP compartment is the only one that can send and receive UDP packets, it's worth checking that it really is talking to the host that you expect:
[ data.network_stack.decode_connection_capability(c) | c = input.compartments.SNTP.imports[_] ; data.network_stack.is_connection_capability(c) ]i
[{"connection_type":"UDP", "host":"pool.ntp.org", "port":123}]
If you've modified the SNTP compartment to point to your NTP service and use its authentication credentials, then this should be different. This can all be part of your firmware's auditing policy.
We designed the TCP/IP stack to automatically and transparently restart on failure (e.g., a CHERI fault or an assertion). The restart procedure broadly works like this (simplified for didactic reasons):
network_socket_close
) with an old socket from the previous instance of the network stack will be detected and failed with an -ENOTCONN
code. This pushes callers to close the sockets and create new ones with the new instance of the TCP/IP stack.The implementation details of the reset slightly deviate from this description.
See the technical documentation in tcpip_error_handler.h
for a full perspective.
Note that the current implementation of the automatic reset makes a few assumptions:
These assumptions leave some attack surface to malicious actors. We are working on improvements to remove or weaken them.