bovine3dom / Nauty.jl

A small Julia wrapper for nauty
Other
8 stars 5 forks source link

Multi Threaded use #5

Closed kakodkar closed 3 years ago

kakodkar commented 3 years ago

Hi I was trying to use your library in a multi threaded environment and got segmentation faults. I was able to fix it in my own project by running ./configure --enable-tls on the nauty subfolder. I don't know much about Julia and hence am unsure of how to fix it in this library. Hence I just thought I'll drop a note.

Thanks!

Source for the above fix: https://users.cecs.anu.edu.au/~bdm/nauty/nug26.pdf Section 16 bullet point (c).

cmcaine commented 3 years ago

Hi, I'm glad you find this library useful! Thanks for tracking down the fix!

The C wrapper was written by me a few years ago and doesn't call configure. I think the simplest way to fix this is to have our deps/build.jl run configure --enable-tls before it tries to compile the wrapper. That will cause nauty.h to be overwritten with a new header that enables thread local storage.

I'll see about writing a PR later :)

cmcaine commented 3 years ago

I've made a PR. Would you mind sharing a code example where multithreading made the old code fail? We can use that as a test.

kakodkar commented 3 years ago

I cant share the exact code I am working on so I've tried to construct this naive example. Please let me know if this case works for you, i.e. fails randomly in the current version and passes in the new version.

I tested running this with julia -t 1 and julia -t 10. The former runs always. The latter fails randomly.

using LightGraphs
using SNAPDatasets
using Nauty

g = loadsnap(:email_enron)
k = 10

function run_nauty_on_subgraph(g, k)
    sg = [rand(1:nv(g))]
    while length(sg) != k
        neighborhood = reduce(vcat, map((x) -> neighbors(g,x), sg))
        u = rand(neighborhood)
        if ! in(u,sg)
            append!(sg, u)
        end
    end
    subgraph, _ = induced_subgraph(g, sg)
    return Nauty.baked_canonical_form(subgraph).canong
end

print(map(fetch,map((_)->(Threads.@spawn run_nauty_on_subgraph(g, k)), 1:100)))
kakodkar commented 3 years ago

I am amazed at the speed of development :-D, Thanks!

cmcaine commented 3 years ago

Thanks. I was able to reproduce the error initially, but it went away again. The PR seems fine and doesn't break the existing tests, though, so we can probably merge it (that'll be up to @bovine3dom).

kakodkar commented 3 years ago

Oh yes, the error only pops up probably when a race condition arises in the nauty lib (please correct me if I am wrong). You can increase the probability of the error occurring by changing the last line to say

print(map(fetch,map((_)->(Threads.@spawn run_nauty_on_subgraph(g, k)), 1:10000)))

And by increasing the number of threads on a server. I was testing with 64 cores.

I too hate such errors which occur randomly!

cmcaine commented 3 years ago

I fixed my test to reliably fail, turns out I was just making some silly mistakes :)

This looks good and will probably get merged soon unless we decide to try and do something about the single threaded performance (which degrades by ~10% with this patch).

kakodkar commented 3 years ago

Yes that is true! They did warn about that. I don't understand CPP and Julia well enough, but is there a way to load a thread local copy of the library and access it? Ref: https://julialang.org/blog/2019/07/multithreading/#thread-local_state

bovine3dom commented 3 years ago

dumb approach: compile it twice, with and without TLS. Check if Threads.nthreads > 1 and then use the right version?

OTOH I'm not sure people (all ... 1 of them?) who use this care that much about performance since it got twice as slow moving from Julia 0.6 to 1.0 and no-one said anything.

cmcaine commented 3 years ago

Yeah, that's the obvious way to do it, though I imagine that:

  1. Nobody cares
  2. There's probably better performance wins to be had, anyway if we did care
  3. Some Julia environments (Juno, maybe VSCode) almost always set nthreads to the number of cores anyway, so it's not a great test (though we could test against Threads.threadid() or something).

Anyway, I think it's not worth worrying about.

kakodkar commented 3 years ago

Thanks!