TuringLang / DynamicPPL.jl

Implementation of domain-specific language (DSL) for dynamic probabilistic programming
https://turinglang.org/DynamicPPL.jl/
MIT License
156 stars 26 forks source link

Remove use of `threadid` #429

Open devmotion opened 1 year ago

devmotion commented 1 year ago

Just noticed that ThreadSafeVarInfo makes heavy use of threadid: https://github.com/TuringLang/DynamicPPL.jl/blob/715526ffa70292436e479e18d762e7ebf31c9181/src/threadsafe.jl#L22-L29

That should be changed since it is broken in Julia >= 1.8 if users don't use @threads :static for ... (see e.g. https://discourse.julialang.org/t/behavior-of-threads-threads-for-loop/76042).

devmotion commented 1 year ago

Generally, I feel the general use of ThreadSafeVarInfo with non-mutable varinfo objects probably is broken and should maybe be disallowed. Everytime different threads would call any of the !! methods (or the ones without !! that still return a new varinfo object), different threads would continue working with different varinfo objects.

The more general use with mutable varinfo objects could be fixed, I assume, by restricting access with a lock. But maybe that's not the most efficient way.

torfjelde commented 1 year ago

Generally, I feel the general use of ThreadSafeVarInfo with non-mutable varinfo objects probably is broken and should maybe be disallowed.

So it should still be functional for evaluation because, yes, even though the returned __varinfo__ is thread-local, the logp field which is mutated is still the same.

For the immutable varinfos we actually have the opportunity to even parallelize sampling by just implementing a merge for the varinfos, though it would require some manual labour from the users side, e.g. mapreduce.

The more general use with mutable varinfo objects could be fixed, I assume, by restricting access with a lock.

Seems like a good idea :+1:

torfjelde commented 1 year ago

Btw, regarding this, how can :dynamic cause issues here?

It seems like :dynamic means that tasks can migrate between threads, but how is it then possible that the two different tasks try to mutate the same index at the same time? For this to happen, it seems that you'd need one Task to yield on the line

vi.logps[Threads.threadid()] += logp

and then another Task, which is moved to the same thread, also hits this line before going back to "complete" this line in the original Task.

But there's no way the above line will actually cause the Task to yield back to the scheduler, right? So even though, yes, now there won't be a specific Task per thread, there still doesn't seem to be a scenario where the same thread will try to perform the above accumulation using two different tasks simultaenously?

Oooor am I misunderstanding something here?

devmotion commented 1 year ago

True, it seems when opening the issue I thought it's worse than it actually is. Generally though, IMO using threadid is bad style even in cases where currently it does not introduce concurrency issues. I think the explanation in https://juliafolds.github.io/FLoops.jl/stable/explanation/faq/#faq-state-threadid is quite good. Currently this specific line does not yield to the scheduler AFAIK (e.g., it does not involve any IO operation or logging) but that's not guaranteed to be the case in future Julia versions.

torfjelde commented 1 year ago

I think the explanation in https://juliafolds.github.io/FLoops.jl/stable/explanation/faq/#faq-state-threadid is quite good

Haha, this is exactly the explanation I was looking at that made me question your original worry :sweat_smile:

But no, I agree with you that the usage of threadid because it could potentially introduce issues like those mentioned :+1:

devmotion commented 1 year ago

https://github.com/JuliaLang/julia/pull/48542

torfjelde commented 1 year ago

Ah neat!

Something like Dictionaries.jl would probably also be useful if you already know the return-type and the it's consistent across tasks.

EDIT: On second thought, usage of Dictionaries.jl is probably not a great idea as it probably expands the underlying Vector on demand, which would surely lead to issues when multiple threads are trying to write to it :sweat_smile:

torfjelde commented 1 year ago

How bad of an idea is it to create a random buffer of Atomic and then use atomic_add!? I.e.

julia> Threads.nthreads()
16

julia> f(x) = log(x)
f (generic function with 1 method)

julia> g(n) = sum(f, 1:n)
g (generic function with 1 method)

julia> function g_t(n; buffer_size=Threads.nthreads())
           buffers = [Threads.Atomic{Float64}(0) for _ = 1:buffer_size]
           Threads.@threads for i = 1:n
               buffer_idx = rand(1:buffer_size)
               Threads.atomic_add!(buffers[buffer_idx], f(i))
           end
           return sum(deref, buffers)
       end
g_t (generic function with 1 method)

julia> @btime $g($(10_000_000));
  74.448 ms (0 allocations: 0 bytes)

julia> @btime $g_t($(10_000_000); buffer_size=$(1));
  613.850 ms (101 allocations: 8.56 KiB)

julia> @btime $g_t($(10_000_000); buffer_size=$(2));
  386.434 ms (101 allocations: 8.55 KiB)

julia> @btime $g_t($(10_000_000); buffer_size=$(4));
  351.702 ms (102 allocations: 8.56 KiB)

julia> @btime $g_t($(10_000_000); buffer_size=$(16));
  152.913 ms (114 allocations: 8.84 KiB)

julia> @btime $g_t($(10_000_000); buffer_size=$(32));
  91.895 ms (130 allocations: 9.22 KiB)

julia> @btime $g_t($(10_000_000); buffer_size=$(64));

  88.032 ms (162 allocations: 9.98 KiB)

julia> @btime $g_t($(10_000_000); buffer_size=$(128));
  83.266 ms (227 allocations: 11.52 KiB)

julia> @btime $g_t($(10_000_000); buffer_size=$(1024));
  67.109 ms (1122 allocations: 32.55 KiB)

:upside_down_face:

EDIT: A probably better solution would be to have buffer_size number of locks that we could just check if are currently acquired or not, and then we just iterate through until we hit one that isn't locked (with some max-tries of course before just waiting). The above was just a quick and dirty way to do something that would still improve efficiency quite drastically vs. a single Atomic.

torfjelde commented 1 year ago

Oh and another thing: Atomics won't work once you put AD in there, rigth? E.g. Atomic seems to have a rather limited support, and so I'd expect even something like Dual to not be usable.