JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.77k stars 5.49k forks source link

Data races in multi-queue initialization #45108

Open tkf opened 2 years ago

tkf commented 2 years ago

multiq_size non-atomically sets

https://github.com/JuliaLang/julia/blob/1c5648d5c3002cb4c6b118ad97502cfc9ca09c48/base/partr.jl#L88-L89

which are later loaded non-atomically in multiq_insert

https://github.com/JuliaLang/julia/blob/1c5648d5c3002cb4c6b118ad97502cfc9ca09c48/base/partr.jl#L98-L104

and in multiq_deletemin

https://github.com/JuliaLang/julia/blob/1c5648d5c3002cb4c6b118ad97502cfc9ca09c48/base/partr.jl#L128-L134

It may be possible to argue that loads are OK because of "consume ordering" and also because this piece of code is always compiled with a known compiler (and so that we can in principle verify that data dependencies exist in the compiled machine code). However, the old C/C++ spec requires release on the store sides (Release-Consume ordering) which does not exist in the current code. Also, I personally have a difficult time imagining an upside in doing such an ill-defined optimization for a fundamental infrastructure like a task scheduler of an official language runtime. It is very unfortunate that no one can reason about the correctness of our task runtime based on a well-defined memory model at the source code level. It is hard to read the code and guess how the implementer tries to establish the orderings since there are no visual markers.

I think there are two much simpler alternative approaches:

  1. Factor out the initialization as a function and call it reasonably early.
  2. Use double-checked locking.

@vtjnash @jpsamaroo @kpamnany What do you think?

vtjnash commented 2 years ago

We do not care if "cong_unbias" is wrong ever, since all it does is remove a tiny amount of bias from the queue. It is getting to be very unfortunate that we do not have array-atomics, to make this cleanly specified.

kpamnany commented 2 years ago

I'm not in favor of dynamically changing the number of threads; this is just one of the many, many places we'll have to handle dynamically resizing structures. However, as there is currently no interface to actually do this, this problem is minor.