compintell / Mooncake.jl

https://compintell.github.io/Mooncake.jl/
MIT License
140 stars 7 forks source link

Don't use `threadid` in e.g. `BBCode` #391

Open gdalle opened 5 days ago

gdalle commented 5 days ago

https://julialang.org/blog/2023/07/PSA-dont-use-threadid/

willtebbutt commented 4 days ago

So the specific bit of a code with a problem here is this:


const _id_count::Dict{Int,Int32} = Dict{Int,Int32}()

"""
    ID()

An `ID` (read: unique name) is just a wrapper around an `Int32`. Uniqueness is ensured via a
global counter, which is incremented each time that an `ID` is created.

This counter can be reset using `seed_id!` if you need to ensure deterministic `ID`s are
produced, in the same way that seed for random number generators can be set.
"""
struct ID
    id::Int32
    function ID()
        current_thread_id = Threads.threadid()
        id_count = get(_id_count, current_thread_id, Int32(0))
        _id_count[current_thread_id] = id_count + Int32(1)
        return new(id_count)
    end
end

which can be found here.

I think the only problem that was causing problems here previously was race conditions, in which multiple threads would attempt to access and increment the _id_count const (which was previously just an Int, rather than a Dict mapping threads to Ints). I believe that the above solves this, because there's no risk of race conditions. I could also have made use of atomic operations or a lock of some kind, and it would have been fine.

This doesn't solve the determinism problem though - it's quite helpful when debugging to have it such that each time you build the same rule, you get the same set of IDs being used for the same things each time. I suspect that I've not encountered this in practice because I've not stress tested this a huge amount on multi-threaded code.

Would indexing by current_task() work? i.e. my understanding is that a given Task can run on any available thread, and have its work moved between threads at arbitrary points in time during execution, but that the Task itself is independent of the thread that it's run on.

gdalle commented 4 days ago

TBH I'm not completely comfortable with the task/thread distinction, I just signaled it because I had the blog post title in mind

willtebbutt commented 4 days ago

Cool. I'll ask around a bit.