alexcrichton / curl-rust

Rust bindings to libcurl
MIT License
1.02k stars 235 forks source link

curl_global_init footgun #333

Closed sagebind closed 4 years ago

sagebind commented 4 years ago

I've decided to open this issue here for discussion.

An Isahc user brought to my attention this week a double-free error occurring in a relatively simple program (issue https://github.com/sagebind/isahc/issues/189), which I managed to simplify even further to a program like this:

fn main() {
    let threads = 8;
    let mut handles = Vec::new();

    for _ in 0..threads {
        handles.push(std::thread::spawn(move || {
            let mut easy = curl::easy::Easy::new();
            easy.url("http://localhost:8080").unwrap();
            easy.perform().unwrap();
        }));
    }

    for handle in handles  {
        handle.join().unwrap();
    }
}

Depending on which versions of curl is used and which SSL engine, the above program can randomly crash with a SIGABRT on a double-free error; in particular, this is with libcurl 7.54.0 and LibreSSL 2.6.5, both system defaults on macOS 10.14 (and likely other versions as well), so the chance of using these versions is pretty high on that platform.

Obtaining a backtrace when such a crash occurs reveals that the bad call of free() is coming from LibreSSL from inside a call to curl_easy_cleanup(), but it is not necessarily a bug. In our example program, curl_global_init() is getting called by curl::init(), which is lazily called once by Easy::new(). This is violating the rules of this function according to the libcurl documentation (snipped the most relevant bits):

The basic rule for constructing a program that uses libcurl is this: Call curl_global_init, with a _CURL_GLOBALALL argument, immediately after the program starts, while it is still only one thread and before it uses libcurl at all. Call curl_global_cleanup immediately before the program exits, when the program is again only one thread and after its last use of libcurl.

[...]

It isn't actually required that the functions be called at the beginning and end of the program -- that's just usually the easiest way to do it. It is required that the functions be called when no other thread in the program is running.

These global constant functions are not thread safe, so you must not call them when any other thread in the program is running. It isn't good enough that no other thread is using libcurl at the time, because these functions internally call similar functions of other libraries, and those functions are similarly thread-unsafe. You can't generally know what these libraries are, or whether other threads are using them.

The thread-unsafety of these functions is more than just "you must protect it with a mutex", but an even stronger requirement that no other threads must be running, which for all intents and purposes means the main thread. So because the first time we use anything from the curl crate is inside a thread other than main, all bets are off and LibreSSL is "free" (bad pun) to have this bug, because we violated the rules of calling curl_global_init(), and in turn violated the rules of SSL_library_init() or whatever init routines LibreSSL has.

Now a simple fix to this program is to add curl::init() to the first line of our main function, but its certainly not obvious that this is required and does not encourage users towards the pit of success. In addition, the above program works just fine with the in-tree libcurl and a new OpenSSL, so users might not ever think of this problem if 99% of the time everything works just fine without explicitly calling init() from the main thread.

This is especially a concern for libraries that depend on curl, which also need to help the user call init() properly even if curl is not directly exposed. I don't have any perfect answers to this problem, but here are some ideas:

Note that I am tempted to experiment with the third option downstream even if it is not considered for the curl crate proper.

Some prior discussion occurred in #157, but the objective was a bit different.

alexcrichton commented 4 years ago

Thanks for the detailed report here! I definitely didn't read the docs closely enough and wow is this a basically impossible requirement to place on consumers of the library. What in the world are we supposed to do in the face of thread unsafety like this?!

In any case I think that your idea in https://github.com/alexcrichton/curl-rust/pull/334 is a pretty plausible way to solve this. I think it's fine to have almost all platforms covered by default there and we can still have the lazy-init if truly needed for other platforms.

One thing I'd like to do first though is to reproduce this. I ran this as cargo run from this repository on Linux and macOS but wasn't able to get a segfault or any error. Are you able to capture a full stack trace of all running threads when the crash happens? Or is there perhaps a more constrained environment you can provide to reproduce? (e.g. does the server matter?)

sagebind commented 4 years ago

wow is this a basically impossible requirement to place on consumers of the library.

To be fair, the author of curl has mentioned that this requirement were mostly put there by the TLS implementations and not curl itself. It also seems that this scenario is maybe lessening according to a recent blog post: https://daniel.haxx.se/blog/2020/03/01/imagining-a-thread-safe-curl_global_init/. Sadly we still have to support older curl versions, even if this requirement were lifted. For example, with this combo I'm definitely putting the blame on LibreSSL. 😞

One thing I'd like to do first though is to reproduce this. I ran this as cargo run from this repository on Linux and macOS but wasn't able to get a segfault or any error.

I was only able to reproduce on macOS when using the system libcurl version 7.54.0 and the system LibreSSL 2.6.5. Even then since the "bug" is probably a race condition in LibreSSL that happens when not init'd properly, I could only reproduce with this combination around 1/100 runs. Though @blinsay was able to reproduce on his machine much more frequently.

Are you able to capture a full stack trace of all running threads when the crash happens?

@blinsay has a backtrace of just the offending thread here: https://gist.github.com/blinsay/3d5174aaa03f5546cd492054889d8789. Unfortunately it seems like the system libs lack debug symbols or something as I can't get a meaningful backtrace inside libcurl or LibreSSL. I will attempt to get a backtrace of all threads today.

alexcrichton commented 4 years ago

Ah yeah I don't mean to really blame anyone per se, it's just that this isn't the only C library that's had these sorts of restrictions. OpenSSL was indeed quite a pain in the past but at least it's fixed now! In any case that looks like a promising blog post!

The backtrace there I think is just of the faulting thread and would need bt all to get all the threads. I'm surprised that the fault, if related to global init, happens during cleanup...

sagebind commented 4 years ago

Perusing the source code for LibreSSL 2.6.5, I see a lot of suspicious thread-local hashing magic going on: https://github.com/libressl-portable/openbsd/tree/libressl-v2.6.5. These are the sort of things that OpenSSL eventually fixed (and LibreSSL 3 maybe?). If I had to guess, LibreSSL's thread hashing tables are not init'd correctly unless called from the main thread, causing weird bugs later.

alexcrichton commented 4 years ago

Hm so it's not necessarily a concurrency issue, it's a main thread issue? TBH I don't even know how you detect the "main thread" on Unix...

sagebind commented 4 years ago

Yeah basically. I know GUI FFI has this problem from time to time as well. Like the macOS native GUI libraries must be called on the main thread. IIRC no one could figure out how to safely enforce that. On iOS there's a system function that knows if the current thread is the main one I think.

blinsay commented 4 years ago

The backtrace there I think is just of the faulting thread and would need bt all to get all the threads. I'm surprised that the fault, if related to global init, happens during cleanup...

Here's bt all with the same code as was in the previous gist.

https://gist.github.com/blinsay/3c6f1dd8dcf0f829faccee2fff0ccdec

Let me know if anything else would be useful.

alexcrichton commented 4 years ago

Bah unfortunately not as illuminating as I thought it would be. Thanks though!

sagebind commented 4 years ago

I can't think of anything else we could possibly do to handle this better beyond #334, so we can probably close this out. Doing some brief research it doesn't seem feasible to detect whether init is called on a thread other than main (besides checking the thread name).