Closed nlw0 closed 2 years ago
Hmmmm.....looking through the involved code, it doesn't seem like there's anything obviously non-multithread-safe. Though I also can't find documentation that the call to mbedtls_pk_sign
in the mbedtls C library is threadsafe. So possibly? It's possible that putting a lock in JSONCredentials
around the private_key
field might help; probably worth a shot.
HTTP.jl itself has had threadsafety issues in the 0.9 releases; the 1.0 release just came out that should resolve those, but I don't think GoogleCloud has updated for compatibility there yet (hopefully not too hard). But it doesn't look like this particular code path deals w/ HTTP.jl at all.
Thanks for taking a look, @quinnj . I'm looking at GoogleCloud.jl/src/session.jl:26, and considering I have a shared global GoogleSession
object, I'm wondering if this can cause a problem. One attempt at using locks hasn't solved it, but maybe I need to change the approach. I'm not familiar with how TLS works, but maybe there's a random generator state that cannot be shared across threads? I'm not sure what parts of my code should go into a mutex in that case, or if I should have separate GoogleSession object. I'm not sure that'd be even even possible, so it's confusing.
And that might not be the whole story, because sometimes I get a segfault or a double-free error, the kind of thing we would really call non-thread-safe. But that's more difficult to reproduce.
GoogleCloud.jl methods can take an optional session
argument, I believe using that with new sessions every thread has solved my problem. Thanks for the help!
I thought the problem was gone after I started using separate session objects, but I've actually got the issue again. I'll try to come up with a minimal example...
I have created a short script that reproduces the issue, here's how it goes:
using GoogleCloud
using MD5, Base64
goocredentials = JSONCredentials("mycreds-123443211234.json")
mybucket = "mybucket-1234"
mypaths = ["..." for n in 1:23] ## 54 different files
mydata = map(mypaths) do path
# begin
# @async begin
Threads.@spawn begin
goosession = GoogleSession(goocredentials, ["devstorage.full_control"])
data = storage(:Object, :get, mybucket, path, session=goosession)
end
end
allbytes = reduce(vcat, fetch.(mydata))
@show base64encode(md5(allbytes))
Running with julia -t1
it works, and using either begin
or @async begin
it works. With Threads.@spawn
it reliably breaks with julia -t2
.
I'm sorry I can't share my data, I can try to set up something public if it helps.
It's not always the same error. One of the errors I get is what I'm seeing most often in my real code, nested task error: MbedTLS error code -17280: RSA - The PKCS#1 verification failed
. Other than that, I get things that look like memory allocation errors such as free(): corrupted unsorted chunks
, signal (11): Segmentation fault
and malloc(): smallbin double linked list corrupted
This looks to me like something in MbedTLS not being thread-safe, even with separate GoogleSession
objects.
I don't know why I didn't see this before, but there really seem to be known issues with MbedTLS in multiple threads: https://github.com/Mbed-TLS/mbedtls/issues/3391 https://github.com/Mbed-TLS/mbedtls/issues/3263
I created a PR on GoogleCloud.jl, but should anything be done in this library? At least a warning in the documentation, perhaps?
Should have been fixed via https://github.com/JuliaLang/MbedTLS.jl/pull/245
I'm working with GoogleCloud.jl, which depends on MbedTLS.jl. Sometimes I've been getting errors like this:
I'm thinking it might be due to thread conflicts. Is MbedTLS thread-safe? Any recommendations how I might debug this? Or perhaps this might need to be fixed within GoogleCloud.jl?