Security audit of the end-to-end encryption implementation

DemiMarie commented 2 years ago

The Problem

The README, as of 52944de2c6ad17f62dff20960caa9c6d9c3e742e, states that the E2EE implementation has never been audited and no guarantees can be made. While obviously Nheko comes with no warantee, it would be nice to have enough confidence in the code that this disclaimer can be removed from the README.

The Solution

Get a security review of the E2EE implementation, as well as reviewing the code for problems (memory unsafety, mostly) that could allow for RCE.

Alternatives

None

Additional context

No response

Happens in the latest version

[X] Yes, this feature is still missing.

deepbluev7 commented 2 years ago

Security audits are expensive. Like in the order of 10-200k€ (afaik). I do not have the money to pay for that. Are there some alternatives to paying for an audit?

DemiMarie commented 2 years ago

Not sure. If Mozilla was sitll funding them, that would be an option. Also offloading the actual logic to a well-regarded library (such as matrix-rust-sdk) might help.

deepbluev7 commented 2 years ago

Probably won't be using the rust sdk. I think you need independent implementations of crypto to find some of the protocol flaws (we found like 3 that way). It also would mean basically to rewrite 3/4 of Nheko, which I really don't feel like doing. It is easier to write a completely new client at that point probably :D

DemiMarie commented 2 years ago

That is a perfectly valid choice, and multiple independent implementations are definitely a good thing. That you found flaws in the protocol is reason enough to justify your decision.

When it comes to logical correctness, I wonder if there could be a shared test suite for Matrix implementations. It is very hard to have too much testing, and a common test suite could at least ensure that if the same flaw affects multiple implementations, all of them get fixed. From personal experience, I can say that writing test cases is tedious, time-consuming, and often not very fun, so a way to share the work across the entire Matrix community would be awesome.

From a more general hardening perspective, Nheko is at a disadvantage due to being written in a memory-unsafe language (C++). Therefore, most of my suggestions are going to be about reducing the likelihood and impact of memory safety holes.

The first suggestion is to replace the system memory allocator with either Daniel Mcay’s hardened malloc or the Bohem conservative garbage collector. The first is designed to make heap exploits significantly more difficult, while the second can ensure that still-reachable memory does not get freed. It might be possible to combine these two to get the best of both worlds. (@thestinger do you have suggestions?).

The second suggestion is to ensure that you are using bounds-checked operations (such as .at() over operator[]), enabling standard library assertions (-D_GLIBCXX_ASSERTIONS -D_FORTIFY_SOURCE=2), and enabling minimal-runtime UBSAN to catch integer overflows and similar. These all have low overhead and are intended for use in production environments. Following the C++ Core Guidelines ought to be quite helpful, as it claims to ensure lifetime and bounds safety, and UBSAN can catch all of the remaining sources of undefined behavior I know of. There could, of course, be ones I do not know of, and I am not sure how well Qt fits in such an environment.

Finally, please avoid QtWebEngine. It’s a significantly patched version of Chromium and does not get sufficiently prompt security patches. Debian explicitly excludes it from security support. If you are able to ensure that the browser only displays trusted content, you might be able to get away with WebKitGTK+, even though you were probably hoping to avoid it. That means avoiding third-party widgets.

Edit: fix @thestinger’s last name

thestinger commented 2 years ago

Following the C++ Core Guidelines ought to be quite helpful, as it claims to ensure lifetime and bounds safety

It doesn't actually do that.

thestinger commented 2 years ago

The first suggestion is to replace the system memory allocator with either Daniel McKay’s hardened malloc or the Bohem conservative garbage collector. The first is designed to make heap exploits significantly more difficult, while the second can ensure that still-reachable memory does not get freed. It might be possible to combine these two to get the best of both worlds. (@thestinger do you have suggestions?).

You can't combine them. It would prevent exploiting use-after-free but Boehm GC comes at a very high cost. It also doesn't make an attempt to mitigate spatial safety issues, only temporal ones,

Boehm GC is a single threaded, conservative, non-moving (non-compacting), non-incremental (stop-the-world) tracing GC. It has to stop the entire process while it traces most of the memory allocated by the process. It can't compact memory like a modern GC. You can have safety issues with it if you have pointers outside of the memory it traces by using OS memory allocation directly instead of going through Boehm implementations wrapping them, etc. It leaks memory due to being conservative, although this is less of an issue on 64-bit due to much lower chance of integers randomly overlapping with the used address space but it's still an issue worth noting. I don't think using it is acceptable for the vast majority of projects. It's nothing like a modern GC implementation for a language with actual support for it.

Note: my last name is Micay, not McKay. It doesn't have an Irish/Scottish origin.

DemiMarie commented 2 years ago

Following the C++ Core Guidelines ought to be quite helpful, as it claims to ensure lifetime and bounds safety

It doesn't actually do that.

Ouch. Someone ought to tell Standard C++ Foundation about that; the document claimed that it does do this last I checked.

Note: my last name is Micay, not McKay. It doesn't have an Irish/Scottish origin.

Sorry about that, fixed.

You can't combine them. It would prevent exploiting use-after-free but Boehm GC comes at a very high cost. It also doesn't make an attempt to mitigate spatial safety issues, only temporal ones,

My hope was that by using appropriate bounds-checking operations, spacial safety could be mitigated, leaving temporal safety as the main problem. I was inspired by MemGC in the Trident rendering engine before it was replaced by Chromium.

Boehm GC is a single threaded, conservative, non-moving (non-compacting), non-incremental (stop-the-world) tracing GC. It has to stop the entire process while it traces most of the memory allocated by the process.

Yeah, that is going to be a problem for anything interactive with a nontrivial amount of memory. There was a paper (CETS, I believe) that claimed to provide temporal safety for C, but it caused a huge slowdown, only supported single-threaded programs, and broke the world ABI-wise. CHERI could help but requires currently-nonexistent hardware.

Looks like the only languages I know that can handle temporal memory safety are Rust, ATS, Ada, and GC’d ones. Worse, ensuring data races do not cause memory corruption seems to require either catching them at compile-time (Rust, SPARK) or banning fat pointers from escaping to other threads (.NET, Java).

I wonder if Qt will be rewritten in Rust. (Only half-joking!)

deepbluev7 commented 2 years ago

So I would dare to argue, that the in our crypto code it is very unlikely that allocations will lead to use-after-frees and similar security issues. This is for the reason that almost all of our crypto code is exclusively using unique_ptr's, using very short functions and bouncing around between only 2 threads (in most cases just one). So arguably a garbage collector wouldn't help much. As a bigger risk factor I see:

logic errors
Buffer overruns or errors otherwise with statically allocated/stack variables
Integer over/underflows
Memory unsafety issues in other parts of the application allowing to compromise the crypto code

Which might make a GC worth it (or a hardened allocator), but a big part of our memory issues have actually been in the interaction with Qml ownership, which I am not sure a GC will deal with properly.

thestinger commented 2 years ago

Ouch. Someone ought to tell Standard C++ Foundation about that; the document claimed that it does do this last I checked.

They don't really care. The document is highly inaccurate on that front but they present it as something that can be gradually fixed over time. Push an element to a vector while iterating over it and you can have a use-after-free, etc. and it's not like they propose avoiding using references, iterators, views and other things which can become dangling.

DemiMarie commented 2 years ago

Ouch. Someone ought to tell Standard C++ Foundation about that; the document claimed that it does do this last I checked.

They don't really care. The document is highly inaccurate on that front but they present it as something that can be gradually fixed over time. Push an element to a vector while iterating over it and you can have a use-after-free, etc. and it's not like they propose avoiding using references, iterators, views and other things which can become dangling.

Could a borrow checker be added as a compiler plugin? It is well-known that adding one purely as a library does not work.

Integer over/underflows

These can actually be solved with minimal-runtime UBSAN. You will need to annotate cases where you are intentionally relying on unsigned integer wrapping.

logic errors

This is not something Rust solves, but a comprehensive test suite across multiple implementations might.

Buffer overruns or errors otherwise with statically allocated/stack variables

Stack canaries can help with these, and I believe there may be ways for UBSAN to catch them in some cases.

Memory unsafety issues in other parts of the application allowing to compromise the crypto code

That is indeed my main worry.

Which might make a GC worth it (or a hardened allocator), but a big part of our memory issues have actually been in the interaction with Qml ownership, which I am not sure a GC will deal with properly.

A hardened allocator would make exploitation significantly more difficult, but a GC probably won’t solve this, in addition to causing a major hit to overall responsiveness.

thestinger commented 2 years ago

-fsanitize=bounds,object-size -fsanitize-trap=bounds,object-size is usable in production with Clang.

bounds sanitizer only catches issues for fixed-size C arrays that are used directly, not through a pointer / arbitrary size array type. Checks are inserted by the frontend and there's zero analysis, etc.

object-size sanitizer is essentially _FORTIFY_SOURCE=2 for pointer accesses, but it can be smarter than the badly aged glibc implementation of _FORTIFY_SOURCE which doesn't support catching read overflows and doesn't use newer features like support for non-constant sizes. This feature is FAR more limited than it may appear.

Most of this will largely not add checks to modern C++ code due to the layers of abstractions and it does very little in most C code. object-size sanitizer is using local analysis and needs to be able to see where the object came from to determine the size along with getting defeated by aliasing, etc. where the pointer could have been changed. It doesn't do as much as you would expect.

Stack canaries can help with these, and I believe there may be ways for UBSAN to catch them in some cases.

Stack canaries are a very poorly aged mitigation mostly only working against sequential overflows, and most overflows are on the heap not the stack. Type-based CFI and ShadowCallStack are more useful modern mitigations but badly supported with near zero adoption outside of Android. Chromium uses a small portion of type-based CFI on Linux but that's almost the only use outside Android.

DemiMarie commented 2 years ago

Stack canaries can help with these, and I believe there may be ways for UBSAN to catch them in some cases.

Stack canaries are a very poorly aged mitigation mostly only working against sequential overflows, and most overflows are on the heap not the stack. Type-based CFI and ShadowCallStack are more useful modern mitigations but badly supported with near zero adoption outside of Android. Chromium uses a small portion of type-based CFI on Linux but that's almost the only use outside Android.

Is this because they require whole-program optimization and static linking?

thestinger commented 2 years ago

ShadowCallStack doesn't but it needs libc integration for proper support and to be able to apply it to dynamic libraries. It's comparable to using SSP (stack canaries) with more work required from libc but not developers.

Type-based CFI requires LTO for the executable and each shared object. It has higher overhead for checks across those units and it provides more precision based on knowing more functions aren't ever indirectly called if you build in all the libraries with LTO. You don't need to use static linking, but LTO across libraries does improve the security it provides. Type-based CFI also requires libc integration for proper support. Chromium deals with this themselves for Linux but it's not as good as it would be if it was integrated in libc.

The main reason these aren't broadly used is because people in the desktop Linux world, etc. are still using GCC despite Clang having faster compile time and producing slightly better code on x86_64 and far better on arm64.

Type-based CFI also requires fixing common undefined behavior in C code with function pointers where type of the callee (the function) doesn't match the type used by the caller (function pointer).

deepbluev7 commented 2 years ago

Ah, one thing I forgot to mentions, for the test suite there is a work in progress test client: https://test.uhoreg.ca/

I think this could be extended to test at least the common cases of key sharing and key permissions, that would be the common logic errors. Especially since it can spoof a malicious homeserver and such.

CFI not being supported in GCC at the moment is an issue (and I think CFI also requires some annotations to work well). While clang is pretty good, development on it slowed down a lot recently and it doesn't provide as much faster compile times and we will never get some distributions to use it because of ideological issues (which I partially agree with). We could however look into enabling common fortifications on compilers, where they are supported (which includes the Windows CFG stuff, or what it was called). There are also some people looking into adding a partial borrow checker to C++ (several independent projects in fact). There might be a stricter subset of C++, that we could try to switch to at some point with less effort (since Nheko already does value or nonmutable refence semantics in most of the Matrix code anyway).

I guess I need to look at what fortifications can be enabled on what compilers with a reasonable performance impact and then we could add those to the default set. My local builds already rotate between address and other sanitizers constantly (I pretty much never run without one), so most of the common cases I would probably have noticed already and many distros enable at least the FORTIFY stuff and guards by default as well, but we could probably define reasonable defaults as well.

DemiMarie commented 2 years ago

That is good. On a positive note, I can tell that you are taking security seriously and that you want to make Nheko as secure and safe as possible.

LorenDB commented 1 year ago

Looking at this issue again, I would like to point something out:

Looks like the only languages I know that can handle temporal memory safety are Rust, ATS, Ada, and GC’d ones.

D is another good option: it has (opt-in) memory safety, and although it's a GCed language by default, the GC can be disabled by using the @nogc property. Furthermore, it has native C++ interop (just use extern(C++)), and there are at least a few C++ wrappers in the standard library (e.g. std::string). I do know of at least one binding generator tool, although I think the one linked is aimed at creating D interfaces for C++ libraries.

I wonder if Qt will be rewritten in Rust. (Only half-joking!)

Offtopic for this comment, but: I'd say not in the short term, but in the long term I wouldn't be surprised to see Qt start abstracting some things to the point that they could easily replace the backend with Rust while not breaking existing C++ code. I just hope that when they do that, they do it in such a way that it makes creating other bindings easier as well.

foresto commented 1 year ago

I have read that Nim also has optional GC, as well as several different GCs to choose from. A Qt binding for Nim recently started.

McSinyx commented 10 months ago

Security audits are expensive. Like in the order of 10-200k€ (afaik).

It is within the bound of NLnet?

thestinger commented 10 months ago

Not all languages with GC are memory safe.

thestinger commented 10 months ago

@deepbluev7 Stack canaries are barely useful in 2023 and the way GCC/Clang implement them is awful. They only provide a weak defense against linear stack buffer overflows which are a tiny portion of overall memory corruption bugs which are mostly heap memory corruption, and mostly not limited to linear overflows. Type-based is not a modern mitigation at this point and while it should be deployed, is not at all the bleeding edge. The bleeding edge is integrating hardware memory tagging support for both heap and stack allocations. FORTIFY_SOURCE is an extremely limited feature for catching a small subset of overflows for certain libc functions. It can be expanded to checking buffer reads rather than only buffer writes along with being expanded to cover cases that are not entirely static via the dynamic size feature that I proposed but it's still only able to catch a small portion of the overflows for those specific functions, and those specific functions are not most memory corruption bugs. There are the bounds sanitizers for doing the same kind of thing for pointer/array usage but it's limited in the same way and almost no one is doing that outside Android. None of this really makes a major difference in the way that memory tagging can.

Using full ASLR (PIE), stack canaries (strong SSP), full RELRO, etc. is the basic early 2000s era mitigations. Using Clang's type-based CFI features brings things more to a 2014 era expectation. Automatic zero variable initialization, trapping bounds sanitizer, trapping integer sanitizers, etc. brings things more to a 2020 era expectation. There's a lot of stuff libc should do that's not done by glibc. glibc malloc doesn't have even a very basic hardened design, and yet doesn't have low fragmentation or great performance either, but they have a major NIH issue and are stuck in their ways so it's not likely to get better. It makes sense to have a choice between high performance and high security, with some choices in between, but they provide none of that as is.

Memory tagging is what's beginning to be deployed now.

Extremely coarse-grained CFI (CFG, CET, BTI) barely helps. The shadow stack is the very useful part of CET, not the coarse-grained CFI part which barely does anything.

thestinger commented 10 months ago

Also, currently, memory safe languages are extremely widespread, very usable and don't come with much or any performance cost depending on the choice. Gradual introduction of memory safe languages is something many projects with large legacy codebases are doing now. https://security.googleblog.com/2022/12/memory-safe-languages-in-android-13.html was the progress on that for Android in last year's release at a low level, and more was made for Android 14. It has always heavily used memory safe languages for high level code though, just not at a low level. A lot of the systems programming was already done in memory safe languages (Java, but increasingly Kotlin instead) since it doesn't really need to avoid GC, etc.

Nheko-Reborn / nheko