Open jrevels opened 8 years ago
Hmm, without looking into the detail, my guess is that the assignment to w
lift the scope of the variable and effectively introduces a race condition between threads (since all of them are modifying and reading the variable) Does it help if you protect the access of w
with a lock?
i.e. something like
function wrong1(v)
ws = ntuple(w->Vector{Int}(length(v)), nthreads())
@threads for t in 1:nthreads()
for i in 1:length(v)
w = ws[t]
w2 = w
w2[i] = v[i]
end
end
w = 1 # unused binding to the name of the previously used loop variable
return ws
end
vs
function wrong2(v)
ws = ntuple(w->Vector{Int}(length(v)), nthreads())
lock = SpinLock()
@threads for t in 1:nthreads()
for i in 1:length(v)
lock!(lock)
w = ws[t]
w2 = w
unlock!(lock)
w2[i] = v[i]
end
end
w = 1 # unused binding to the name of the previously used loop variable
return ws
end
The lock fixes it, I think you're right about the scope lifting of w
. Thus, another way to fix this - without introducing the lock - is by declaring w
as local
inside the loop. I guess this is semantically consistent with Julia's current scoping behavior, but it means that the current scoping behavior is scary in a multithreaded context.
If we want the behavior to be as close as possible to a non-threading version, we could make all assignments in a @thread
local instead (no matter threading is enabled or not). Hopefully if a loop is declared as @thread
it usually shouldn't be to assign to any local variable. However, in case it is actually what the user want (find a unique match in a threaded loop maybe?) I think there should be a way to do that (like python's nonlocal
).
This doesn't help if the user is calling another closure that assigning to another local variable in the outer scope but I don't think we can fix that without completely break the scoping rule.
This is actually what the hacky localize_vars
thing was introduced for --- to copy bindings so that assignments can't leak outside of a certain block. Originally for the distributed case, where you don't want binding relationships to depend on which machine the code runs on.
You do want to be able to access shared variables in an outer scope from inside a parallel loop; it's a simple and useful pattern. I don't think we can force local-only assignments.
I think the right answer is to generate a warning when a variable's scope is lifted after there has been an assignment to it. Practical?
What do you mean by "local-only assignments"? For example, I'm not sure what this code is meant to do:
x = 1
@threads for i = 1:n
x = i
f(x)
end
In this case, why is it useful to share x
among the threads?
And I think this can be done with sth like,
x = Ref(0)
@threads for i in 1:n
f(i) && x[] = 1
end
even if we "localize" all the variables in a @threads
block.
By "local-only assignments", I was referring to @yuyichao's suggestion to make all assignments in an @threads
block local. I don't think this is the right answer because it is useful to share outer-scope variables among threads. It doesn't seem useful to share x
among the threads for the case you mention, but:
x = Atomic()
@threads for i = 1:n
atomic_add!(x, foo(i))
end
bar(x[])
Or:
x = ConcurrentMap()
@threads for i = 1:n
if foo(i)
append!(x, (bar(i), baz(i))) # or something like this
end
end
# iterate through x doing something
Both of the code you have does not have assignment in the loop. There's mutations of mutable object but no assignment so they won't be affected at all.
Given foo(x)
returns true
for only one element of some collection C
, then:
p = false
@threads for i = 1:length(C)
if foo(C[i])
p = true
end
end
if p
# do something
The principle is the same.
This pattern can be covered with Ref
I mentioned above.
Sure. There are all sorts of ways to get those semantics, but are they clear semantics? My feeling is not.
What's not clear about this?
I think allowing assignment to local variable by default is not very clear given the example in the original post. Raising a warning based on some guessing doesn't seem too nice either since the actual problem is not about where the variable is defined in the enclosing scope but that there's code that works with single threaded (that reuses a variable in the enclosing scope) but not with multi-threading.
The alternative schematic is that you can't assign to a local variable in a @threads
block. That seems pretty clear to me (even though it's different from a normal for loop. OTOH it's closer to a parallel for loop, which is basically what a @threads
loop is). On top of this, we only need a way to pass a value from the @threads
to the outer-scope, which can either be done with a keyword like python's nonlocal
or explicitly with a local mutable object in the outer scope. Comparing these two approaches, I think using a Ref
explicitly is more flexible and has a very clear schematics.
To me, the following two snippets are inconsistent.
x = 0
@threads for i = 1:n
if foo(i)
x = threadid()
end
end
# x = 0
x = Int[0]
@threads for i = 1:n
if foo(i)
x[1] = threadid()
end
end
# x[1] != 0
If there is a well-understood difference between assignment to a variable and mutation via a call among Julia programmers, then I guess this is just me.
Still, the other thing to consider is that the serial elision of the first loop (i.e. remove just @threads
) will have different behavior.
difference between assignment to a variable and mutation via a call
I think this is a very fundamental distinction in the schematics.
Still, the other thing to consider is that the serial elision of the first loop (i.e. remove just
@threads
) will have different behavior.
It doesn't have too. Same with the behavior on a non-threading build.
It doesn't have too. Same with the behavior on a non-threading build.
Not sure what you mean? The localization approach does mean that the behavior of the code will be different if you run the loop serially vs. if you run it in parallel.
One of the nicest things about Cilk is serial equivalence (https://software.intel.com/en-us/blogs/2012/04/07/serial-equivalence-of-cilk-plus-programs). I'm not sure we actually have it with Julia threads at the moment, but it is a worthwhile goal.
I think I agree with @kpamnany here; with threads it is possible and efficient to keep the same scope behavior we have now. The only downside is race conditions, but those are possible anyway with assignment to arrays.
I think the only reason an issue was filed here is that a variable can be added to a scope by an assignment that occurs after a use of the variable. I can see why this is confusing, but it's what we've always done. It tends to make a lot of sense in a case like
function f()
g(x) = h(x)
h(x) = g(x)
end
The localization approach does mean that the behavior of the code will be different if you run the loop serially vs. if you run it in parallel.
No, it means that the behavior is different when @threads
is added but as long as it's in a @threads
block, the behavior will be the same.
I think the only reason an issue was filed here is that a variable can be added to a scope by an assignment that occurs after a use of the variable
I don't agree. There can be code like,
# <...>
x = ... # this is as likely to happen as if x is used as a temporary variable after the loop
# <...>
@threads for ...
# <...>
x = ...
# <...>
end
# don't care about the value of x here
The code uses x
as a variable name inside and outside the threading block (maybe because it is a name that makes sense for such thing) and the code works fine when running with a single thread and fails when there's more than one thread.
Yes, it's really easy to write parallel code with race conditions. Nothing new there.
Sure. I'm fine if we just say this is a user error and we will do nothing about it (and the user should add local
to all the variables that will otherwise be accidentally shared).
I just don't think the problem is the order and we shouldn't be adding a warning for such case.
I just don't think the problem is the order
I'm very glad to hear that, because I don't really want to change this :)
Can you explain this please?
function f() g(x) = h(x) h(x) = g(x) end
In the absence of understanding of the above, I still think that lifting a variable's scope after use of that variable (lexically speaking) should cause a warning. There's no guesswork in such a rule, but I guess there are other priorities.
A race condition checker should be on the TODO list somewhere. :-/
@kpamnany In that code h
is referenced before it is introduced by assignment, and this is an idiomatic way to write mutually-recursive inner functions.
Dont use @threads onless you want me in on this guys...
On Sat, Feb 6, 2016, 03:16 Jeff Bezanson notifications@github.com wrote:
@kpamnany https://github.com/kpamnany In that code h is referenced before it is introduced by assignment, and this is an idiomatic way to write mutually-recursive inner functions.
— Reply to this email directly or view it on GitHub https://github.com/JuliaLang/julia/issues/14948#issuecomment-180659028.
@threads Sorry, I guess I forgot to quote it in code block when quoting the reply ...
Why does the code following @threads
behave like a block? It really should behave like a function instead. It's a cute syntax that you can just throw an @threads
in front of a for
loop, but the reality is that this introduces a rather deep change, and one has to be quite careful about which loops remain correct when doing so. As we've learned, it's not even sufficient to only study the content of the loop, and this is deeply surprising even to experts.
We can of course keep the syntax, but we can define that the internal representation corresponds to a lambda expression, so that all local variable assignments remain hidden. This transformation can still be done when multi-threading is disabled; it's actually probably useful for SIMD-vectorization etc. as well.
it is lowered to a lambda and has the exact same scoping behavior. whether this behavior (mutating closed over variables) is a good idea is another question. I think in some cases you do want that.
@JeffBezanson actually that makes me think that an argument against doing that by default is that it's the behavior of global scope : assigning to a global variable requires declaring it global
even if it already exists, to "pull" the binding in from the upper scope. We could do the same in anonymous functions (using another keyword).
The threaded loop macro is special though because I think it's quite convenient to pretend to be a proper for loop.
Just to add my 2c. I think should warn or even prevent on access to shared variable or overlapping array ranges in different threads or any parallel code. If you can deduce outer scope access is mutually exclusive or race condition free by using a lock or CAS operation then only it compile normally. Also the same checking infrastructure can be used in other parallelization constructs.
This problem (deciding dynamic array ranges overlap) is undecidable. Worse, it's hard for all but the simplest cases. You get to pick between a deluge of false positives or a safety feature that almost never triggers.
The solution to this issue must be a consistent rule. It's fine for SMP programming to be hard if you want to do it manually, let's just find a way to avoid mistakes that are avoidable, such as the one in this issue, rather than making a full blown static race condition checker :-)
What every works. Languages like Nim (http://nim-lang.org/) uses a proof assistant to check these conditions. Having said this as you say perhaps you can start with the most common and gradually increase what concurrency errors you might get perhaps using a formal approach so one day nearly every case is checked.
I think the conclusion here is that this is intended behavior and there are no current plans to change it. If so, are you happy to close this @jrevels?
The consensus here makes sense to me, but I could see this issue being a common pitfall for new users, so I'd like to make sure that it eventually gets a mention in our multithreading documentation (which doesn't yet exist, but I'm assuming will one day once multithreading has stabilized). I could close this and open a new docs issue, if people prefer, otherwise I'll just leave this issue open.
More help from the compiler with an error message when there is a definitive race condition and a warning when there is a potential race condition would be the best way forward. You can add it to the docs but best is to know this when you are compiling and there is a issue with the code.
Closing since this is by-design and fixed by resolving the race-condition.
Combined with our assignment rules in if
/else
blocks, e.g.
if use_threads
@threads for i = 1:nthreads()
Xlocal = copy(X) # because I want a thread local copy of X
end
else
Xlocal = X # I don't need a copy when I don't use threads but suddenly `Xlocal` in the other branch is not local anymore.
end
I think the current behavior of @threads
will cause some confusion for Julia users that are not very familiar with scoping rules. If that is a group we care about, I think we should reopen this issue.
Even if this behavior doesn't change, this issue was closed despite presenting a still uncompleted action item - that's why I left it open in the first place. I'm assuming that nobody would argue against documenting this behavior explicitly (if this is the behavior we're keeping)? Since that documentation still doesn't exist, I'm going to reopen this.
@andreasnoack: the comment in your code snippet is truncated, so I'm not sure what the confusion is?
@kpamnany Sorry. Have updated the comment. The problem is that the use of Xlocal
in the second branch changes the behavior of Xlocal
in the first branch.
OpenMP deals with this by supporting private
and shared
clauses to indicate which of the variables that are already in scope should have per-thread copies and which should not. But these are the sort of fiddly details that hamper ease of use, and I don't think we should add controls for every such thing.
I believe that once some documentation is in, this will not be very confusing. We can certainly leave the issue open to remind us (me I guess) to write this up.
We have local
, but the main difference between us and C/C++ is that we don't require explicit variable declaration so the scope of a variable is less clear for the reader.
In my applications, it seems as if local variable are more common, e.g. the code that made me post here would require ten local
s. The solution I ended up with wrapped all of the threaded code in a function like
if use_threads
fooThreads!(a,b,c,d,e)
else
fooSeq!(a,b,c,d,e)
end
with
function fooThreads!
@threads for i in 1:nthreads()
...
end
end
I've ended up with a similar pattern in other use cases of @threads
where it helped performance so I'm wondering if the behavior of the @threads for
is the most useful possible at the moment.
I'd try to not have anything like use_threads
-- when you run serially, nthreads()
is 1, so the code should just work as it does in the multithreaded case. Which is the whole point of serial equivalence. Is that not a viable approach?
In the final version, it should be fine to control threading with JULIA_NUM_THREADS
only but here I made the branch to have an easy way to investigate how @threads
affected the results. It is then ironic that the threaded code only had problems because I added a branch to check if it was correct.
I guess this is more or less closed by #33119 (and #30896 is basically a duplicate of this issue), but this was left open because we lack documentation explaining how to avoid this problem. I guess now we should add documentation explaining how to use $
-interpolation in @async
and @spawn
to avoid this variable lifting.
In fact, it looks like @spawn
still isn't mentioned at all in the https://docs.julialang.org/en/v1.4-dev/manual/parallel-computing/ page, so there's probably a lot of work to be done on documentation..
I think this issue has nothing to do with threads. The confusion comes from the scoping rules - the assignment to a variable after the cycle makes this variable non-local in the cycle. So, it's about ordering. And from a newbie perspective this is not intuitive. This is also inconsistent with the interactive global scoping rules (for an obvious reason - we can't know the future global scope).
I've edited the second part of this comment since I found this motivation for the scoping rule. This is sensible indeed. BTW, that comment explains the rule more clearly than docs. I think docs should emphasize on this more since Julia is different from many other languages in this regard.
It would make sense to emit a compiler warning if a variable leaves a scope only because of how it is used later in the same function. That could be extended by a simple mechanism to declare local variables further up in a function.
Stefan Karpinski's argument (semantics should be independent of order) implies that a variables' scope begins before it is used the first time. This is counter-intuitive inside a function where statements are always executed in order, and it catches people by surprise.
I ran into this very strange bug while exploring the new multithreading stuff. The smallest reproducible case I can come up with:
Note that the only difference between the above two functions is
w2 = 1
andw = 1
, and that neither variable is actually used after being assigned.Running the above in the REPL:
It appears that some iterations of the multithreaded loop in
wrong(v)
aren't always being executed, or possibly that the binding of the loop variablew
tows[t]
is broken. Either way, one can induce the correct behavior by making sure the post-loop binding to1
doesn't match the loop variable name.I thought maybe that this was a escaping/scoping bug in
_threadsfor
, but I didn't see anything that jumped out at me.@mlubin