Closed sagebind closed 4 years ago
Thanks for the detailed report here! I definitely didn't read the docs closely enough and wow is this a basically impossible requirement to place on consumers of the library. What in the world are we supposed to do in the face of thread unsafety like this?!
In any case I think that your idea in https://github.com/alexcrichton/curl-rust/pull/334 is a pretty plausible way to solve this. I think it's fine to have almost all platforms covered by default there and we can still have the lazy-init if truly needed for other platforms.
One thing I'd like to do first though is to reproduce this. I ran this as cargo run
from this repository on Linux and macOS but wasn't able to get a segfault or any error. Are you able to capture a full stack trace of all running threads when the crash happens? Or is there perhaps a more constrained environment you can provide to reproduce? (e.g. does the server matter?)
wow is this a basically impossible requirement to place on consumers of the library.
To be fair, the author of curl has mentioned that this requirement were mostly put there by the TLS implementations and not curl itself. It also seems that this scenario is maybe lessening according to a recent blog post: https://daniel.haxx.se/blog/2020/03/01/imagining-a-thread-safe-curl_global_init/. Sadly we still have to support older curl versions, even if this requirement were lifted. For example, with this combo I'm definitely putting the blame on LibreSSL. 😞
One thing I'd like to do first though is to reproduce this. I ran this as
cargo run
from this repository on Linux and macOS but wasn't able to get a segfault or any error.
I was only able to reproduce on macOS when using the system libcurl version 7.54.0 and the system LibreSSL 2.6.5. Even then since the "bug" is probably a race condition in LibreSSL that happens when not init'd properly, I could only reproduce with this combination around 1/100 runs. Though @blinsay was able to reproduce on his machine much more frequently.
Are you able to capture a full stack trace of all running threads when the crash happens?
@blinsay has a backtrace of just the offending thread here: https://gist.github.com/blinsay/3d5174aaa03f5546cd492054889d8789. Unfortunately it seems like the system libs lack debug symbols or something as I can't get a meaningful backtrace inside libcurl or LibreSSL. I will attempt to get a backtrace of all threads today.
Ah yeah I don't mean to really blame anyone per se, it's just that this isn't the only C library that's had these sorts of restrictions. OpenSSL was indeed quite a pain in the past but at least it's fixed now! In any case that looks like a promising blog post!
The backtrace there I think is just of the faulting thread and would need bt all
to get all the threads. I'm surprised that the fault, if related to global init, happens during cleanup...
Perusing the source code for LibreSSL 2.6.5, I see a lot of suspicious thread-local hashing magic going on: https://github.com/libressl-portable/openbsd/tree/libressl-v2.6.5. These are the sort of things that OpenSSL eventually fixed (and LibreSSL 3 maybe?). If I had to guess, LibreSSL's thread hashing tables are not init'd correctly unless called from the main thread, causing weird bugs later.
Hm so it's not necessarily a concurrency issue, it's a main thread issue? TBH I don't even know how you detect the "main thread" on Unix...
Yeah basically. I know GUI FFI has this problem from time to time as well. Like the macOS native GUI libraries must be called on the main thread. IIRC no one could figure out how to safely enforce that. On iOS there's a system function that knows if the current thread is the main one I think.
The backtrace there I think is just of the faulting thread and would need bt all to get all the threads. I'm surprised that the fault, if related to global init, happens during cleanup...
Here's bt all
with the same code as was in the previous gist.
https://gist.github.com/blinsay/3c6f1dd8dcf0f829faccee2fff0ccdec
Let me know if anything else would be useful.
Bah unfortunately not as illuminating as I thought it would be. Thanks though!
I can't think of anything else we could possibly do to handle this better beyond #334, so we can probably close this out. Doing some brief research it doesn't seem feasible to detect whether init is called on a thread other than main (besides checking the thread name).
I've decided to open this issue here for discussion.
An Isahc user brought to my attention this week a double-free error occurring in a relatively simple program (issue https://github.com/sagebind/isahc/issues/189), which I managed to simplify even further to a program like this:
Depending on which versions of curl is used and which SSL engine, the above program can randomly crash with a SIGABRT on a double-free error; in particular, this is with libcurl 7.54.0 and LibreSSL 2.6.5, both system defaults on macOS 10.14 (and likely other versions as well), so the chance of using these versions is pretty high on that platform.
Obtaining a backtrace when such a crash occurs reveals that the bad call of
free()
is coming from LibreSSL from inside a call tocurl_easy_cleanup()
, but it is not necessarily a bug. In our example program,curl_global_init()
is getting called bycurl::init()
, which is lazily called once byEasy::new()
. This is violating the rules of this function according to the libcurl documentation (snipped the most relevant bits):The thread-unsafety of these functions is more than just "you must protect it with a mutex", but an even stronger requirement that no other threads must be running, which for all intents and purposes means the main thread. So because the first time we use anything from the curl crate is inside a thread other than
main
, all bets are off and LibreSSL is "free" (bad pun) to have this bug, because we violated the rules of callingcurl_global_init()
, and in turn violated the rules ofSSL_library_init()
or whatever init routines LibreSSL has.Now a simple fix to this program is to add
curl::init()
to the first line of our main function, but its certainly not obvious that this is required and does not encourage users towards the pit of success. In addition, the above program works just fine with the in-tree libcurl and a new OpenSSL, so users might not ever think of this problem if 99% of the time everything works just fine without explicitly callinginit()
from the main thread.This is especially a concern for libraries that depend on curl, which also need to help the user call
init()
properly even if curl is not directly exposed. I don't have any perfect answers to this problem, but here are some ideas:init()
more clear in the documentation, and encourage any downstream libraries that depend on curl to re-export or provide their owninit()
functions as well. This pretty much amounts to what is suggested in the libcurl docs.init()
unconditionally panic if called on any thread besides the main thread. This would make it obvious during development if this requirement is being violated, but I have no idea if we can check if the current thread is the main one or not.init()
for the user so that on such platforms we do the right thing automatically.Note that I am tempted to experiment with the third option downstream even if it is not considered for the curl crate proper.
Some prior discussion occurred in #157, but the objective was a bit different.