Race condition caused by variable scope getting lifted from a multithreaded context

jrevels commented 8 years ago

I ran into this very strange bug while exploring the new multithreading stuff. The smallest reproducible case I can come up with:

using Base.Threads

function right(v)
    ws = ntuple(w->Vector{Int}(length(v)), nthreads())
    @threads for t in 1:nthreads()
        w = ws[t]
        for i in 1:length(v)
            w[i] = v[i]
        end
    end
    w2 = 1 # unused binding to an unused variable name
    return ws
end

function wrong(v)
    ws = ntuple(w->Vector{Int}(length(v)), nthreads())
    @threads for t in 1:nthreads()
        w = ws[t]
        for i in 1:length(v)
            w[i] = v[i]
        end
    end
    w = 1 # unused binding to the name of the previously used loop variable
    return ws
end

Note that the only difference between the above two functions is w2 = 1 and w = 1, and that neither variable is actually used after being assigned.

Running the above in the REPL:

julia> v = collect(1:3)

julia> right(v) # gives the correct output every time
([1,2,3],[1,2,3],[1,2,3],[1,2,3])

julia> wrong(v)
([4624170704,4658607408,4624170720],[4563189968,4574717712,4563189968],[1,2,3],[1,2,3])

julia> wrong(v)
([1,2,3],[1,2,3],[1,2,3],[4624170800,4658607472,4624170816])

julia> wrong(v)
([1,2,3],[1,2,3],[1,2,3],[1,2,3])

julia> wrong(v)
([1,4658458992,4624197552],[1,2,3],[4624197648,4624197664,4624197680],[1,2,3])

It appears that some iterations of the multithreaded loop in wrong(v) aren't always being executed, or possibly that the binding of the loop variable w to ws[t] is broken. Either way, one can induce the correct behavior by making sure the post-loop binding to 1 doesn't match the loop variable name.

I thought maybe that this was a escaping/scoping bug in _threadsfor, but I didn't see anything that jumped out at me.

julia> versioninfo()
Julia Version 0.5.0-dev+2502
Commit baf336f (2016-02-04 18:01 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin13.4.0)
  CPU: Intel(R) Core(TM) i5-4288U CPU @ 2.60GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.7.1

@mlubin

yuyichao commented 8 years ago

Hmm, without looking into the detail, my guess is that the assignment to w lift the scope of the variable and effectively introduces a race condition between threads (since all of them are modifying and reading the variable) Does it help if you protect the access of w with a lock?

yuyichao commented 8 years ago

i.e. something like

function wrong1(v)
    ws = ntuple(w->Vector{Int}(length(v)), nthreads())
    @threads for t in 1:nthreads()
        for i in 1:length(v)
            w = ws[t]
            w2 = w
            w2[i] = v[i]
        end
    end
    w = 1 # unused binding to the name of the previously used loop variable
    return ws
end

vs

function wrong2(v)
    ws = ntuple(w->Vector{Int}(length(v)), nthreads())
    lock = SpinLock()
    @threads for t in 1:nthreads()
        for i in 1:length(v)
            lock!(lock)
            w = ws[t]
            w2 = w
            unlock!(lock)
            w2[i] = v[i]
        end
    end
    w = 1 # unused binding to the name of the previously used loop variable
    return ws
end

jrevels commented 8 years ago

The lock fixes it, I think you're right about the scope lifting of w. Thus, another way to fix this - without introducing the lock - is by declaring w as local inside the loop. I guess this is semantically consistent with Julia's current scoping behavior, but it means that the current scoping behavior is scary in a multithreaded context.

yuyichao commented 8 years ago

If we want the behavior to be as close as possible to a non-threading version, we could make all assignments in a @thread local instead (no matter threading is enabled or not). Hopefully if a loop is declared as @thread it usually shouldn't be to assign to any local variable. However, in case it is actually what the user want (find a unique match in a threaded loop maybe?) I think there should be a way to do that (like python's nonlocal).

This doesn't help if the user is calling another closure that assigning to another local variable in the outer scope but I don't think we can fix that without completely break the scoping rule.

JeffBezanson commented 8 years ago

This is actually what the hacky localize_vars thing was introduced for --- to copy bindings so that assignments can't leak outside of a certain block. Originally for the distributed case, where you don't want binding relationships to depend on which machine the code runs on.

kpamnany commented 8 years ago

You do want to be able to access shared variables in an outer scope from inside a parallel loop; it's a simple and useful pattern. I don't think we can force local-only assignments.

I think the right answer is to generate a warning when a variable's scope is lifted after there has been an assignment to it. Practical?

JeffBezanson commented 8 years ago

What do you mean by "local-only assignments"? For example, I'm not sure what this code is meant to do:

x = 1
@threads for i = 1:n
    x = i
    f(x)
end

In this case, why is it useful to share x among the threads?

yuyichao commented 8 years ago

And I think this can be done with sth like,

x = Ref(0)
@threads for i in 1:n
    f(i) && x[] = 1
end

even if we "localize" all the variables in a @threads block.

kpamnany commented 8 years ago

By "local-only assignments", I was referring to @yuyichao's suggestion to make all assignments in an @threads block local. I don't think this is the right answer because it is useful to share outer-scope variables among threads. It doesn't seem useful to share x among the threads for the case you mention, but:

x = Atomic()
@threads for i = 1:n
    atomic_add!(x, foo(i))
end
bar(x[])

Or:

x = ConcurrentMap()
@threads for i = 1:n
    if foo(i)
        append!(x, (bar(i), baz(i))) # or something like this
    end
end
# iterate through x doing something

yuyichao commented 8 years ago

Both of the code you have does not have assignment in the loop. There's mutations of mutable object but no assignment so they won't be affected at all.

kpamnany commented 8 years ago

Given foo(x) returns true for only one element of some collection C, then:

p = false
@threads for i = 1:length(C)
    if foo(C[i])
        p = true
    end
end
if p
# do something

The principle is the same.

yuyichao commented 8 years ago

This pattern can be covered with Ref I mentioned above.

kpamnany commented 8 years ago

Sure. There are all sorts of ways to get those semantics, but are they clear semantics? My feeling is not.

yuyichao commented 8 years ago

What's not clear about this?

I think allowing assignment to local variable by default is not very clear given the example in the original post. Raising a warning based on some guessing doesn't seem too nice either since the actual problem is not about where the variable is defined in the enclosing scope but that there's code that works with single threaded (that reuses a variable in the enclosing scope) but not with multi-threading.

The alternative schematic is that you can't assign to a local variable in a @threads block. That seems pretty clear to me (even though it's different from a normal for loop. OTOH it's closer to a parallel for loop, which is basically what a @threads loop is). On top of this, we only need a way to pass a value from the @threads to the outer-scope, which can either be done with a keyword like python's nonlocal or explicitly with a local mutable object in the outer scope. Comparing these two approaches, I think using a Ref explicitly is more flexible and has a very clear schematics.

kpamnany commented 8 years ago

To me, the following two snippets are inconsistent.

x = 0
@threads for i = 1:n
    if foo(i)
        x = threadid()
    end
end
# x = 0

x = Int[0]
@threads for i = 1:n
    if foo(i)
        x[1] = threadid()
    end
end
# x[1] != 0

If there is a well-understood difference between assignment to a variable and mutation via a call among Julia programmers, then I guess this is just me.

Still, the other thing to consider is that the serial elision of the first loop (i.e. remove just @threads) will have different behavior.

yuyichao commented 8 years ago

difference between assignment to a variable and mutation via a call

I think this is a very fundamental distinction in the schematics.

Still, the other thing to consider is that the serial elision of the first loop (i.e. remove just @threads) will have different behavior.

It doesn't have too. Same with the behavior on a non-threading build.

kpamnany commented 8 years ago

It doesn't have too. Same with the behavior on a non-threading build.

Not sure what you mean? The localization approach does mean that the behavior of the code will be different if you run the loop serially vs. if you run it in parallel.

One of the nicest things about Cilk is serial equivalence (https://software.intel.com/en-us/blogs/2012/04/07/serial-equivalence-of-cilk-plus-programs). I'm not sure we actually have it with Julia threads at the moment, but it is a worthwhile goal.

JeffBezanson commented 8 years ago

I think I agree with @kpamnany here; with threads it is possible and efficient to keep the same scope behavior we have now. The only downside is race conditions, but those are possible anyway with assignment to arrays.

I think the only reason an issue was filed here is that a variable can be added to a scope by an assignment that occurs after a use of the variable. I can see why this is confusing, but it's what we've always done. It tends to make a lot of sense in a case like

function f()
   g(x) = h(x)
   h(x) = g(x)
end

yuyichao commented 8 years ago

The localization approach does mean that the behavior of the code will be different if you run the loop serially vs. if you run it in parallel.

No, it means that the behavior is different when @threads is added but as long as it's in a @threads block, the behavior will be the same.

I think the only reason an issue was filed here is that a variable can be added to a scope by an assignment that occurs after a use of the variable

I don't agree. There can be code like,

# <...>
x = ... # this is as likely to happen as if x is used as a temporary variable after the loop
# <...>
@threads for ...
    # <...>
    x = ...
    # <...>
end
# don't care about the value of x here

The code uses x as a variable name inside and outside the threading block (maybe because it is a name that makes sense for such thing) and the code works fine when running with a single thread and fails when there's more than one thread.

JeffBezanson commented 8 years ago

Yes, it's really easy to write parallel code with race conditions. Nothing new there.

yuyichao commented 8 years ago

Sure. I'm fine if we just say this is a user error and we will do nothing about it (and the user should add local to all the variables that will otherwise be accidentally shared).

I just don't think the problem is the order and we shouldn't be adding a warning for such case.

JeffBezanson commented 8 years ago

I just don't think the problem is the order

I'm very glad to hear that, because I don't really want to change this :)

kpamnany commented 8 years ago

Can you explain this please?

function f()
   g(x) = h(x)
   h(x) = g(x)
end

In the absence of understanding of the above, I still think that lifting a variable's scope after use of that variable (lexically speaking) should cause a warning. There's no guesswork in such a rule, but I guess there are other priorities.

A race condition checker should be on the TODO list somewhere. :-/

JeffBezanson commented 8 years ago

@kpamnany In that code h is referenced before it is introduced by assignment, and this is an idiomatic way to write mutually-recursive inner functions.

threads commented 8 years ago

Dont use @threads onless you want me in on this guys...

On Sat, Feb 6, 2016, 03:16 Jeff Bezanson notifications@github.com wrote:

@kpamnany https://github.com/kpamnany In that code h is referenced before it is introduced by assignment, and this is an idiomatic way to write mutually-recursive inner functions.

— Reply to this email directly or view it on GitHub https://github.com/JuliaLang/julia/issues/14948#issuecomment-180659028.

yuyichao commented 8 years ago

@threads Sorry, I guess I forgot to quote it in code block when quoting the reply ...

eschnett commented 8 years ago

Why does the code following @threads behave like a block? It really should behave like a function instead. It's a cute syntax that you can just throw an @threads in front of a for loop, but the reality is that this introduces a rather deep change, and one has to be quite careful about which loops remain correct when doing so. As we've learned, it's not even sufficient to only study the content of the loop, and this is deeply surprising even to experts.

We can of course keep the syntax, but we can define that the internal representation corresponds to a lambda expression, so that all local variable assignments remain hidden. This transformation can still be done when multi-threading is disabled; it's actually probably useful for SIMD-vectorization etc. as well.

carnaval commented 8 years ago

it is lowered to a lambda and has the exact same scoping behavior. whether this behavior (mutating closed over variables) is a good idea is another question. I think in some cases you do want that.

carnaval commented 8 years ago

@JeffBezanson actually that makes me think that an argument against doing that by default is that it's the behavior of global scope : assigning to a global variable requires declaring it global even if it already exists, to "pull" the binding in from the upper scope. We could do the same in anonymous functions (using another keyword).

The threaded loop macro is special though because I think it's quite convenient to pretend to be a proper for loop.

sirinath commented 8 years ago

Just to add my 2c. I think should warn or even prevent on access to shared variable or overlapping array ranges in different threads or any parallel code. If you can deduce outer scope access is mutually exclusive or race condition free by using a lock or CAS operation then only it compile normally. Also the same checking infrastructure can be used in other parallelization constructs.

carnaval commented 8 years ago

This problem (deciding dynamic array ranges overlap) is undecidable. Worse, it's hard for all but the simplest cases. You get to pick between a deluge of false positives or a safety feature that almost never triggers.

The solution to this issue must be a consistent rule. It's fine for SMP programming to be hard if you want to do it manually, let's just find a way to avoid mistakes that are avoidable, such as the one in this issue, rather than making a full blown static race condition checker :-)

sirinath commented 8 years ago

What every works. Languages like Nim (http://nim-lang.org/) uses a proof assistant to check these conditions. Having said this as you say perhaps you can start with the most common and gradually increase what concurrency errors you might get perhaps using a formal approach so one day nearly every case is checked.

kpamnany commented 8 years ago

I think the conclusion here is that this is intended behavior and there are no current plans to change it. If so, are you happy to close this @jrevels?

jrevels commented 8 years ago

The consensus here makes sense to me, but I could see this issue being a common pitfall for new users, so I'd like to make sure that it eventually gets a mention in our multithreading documentation (which doesn't yet exist, but I'm assuming will one day once multithreading has stabilized). I could close this and open a new docs issue, if people prefer, otherwise I'll just leave this issue open.

sirinath commented 8 years ago

More help from the compiler with an error message when there is a definitive race condition and a warning when there is a potential race condition would be the best way forward. You can add it to the docs but best is to know this when you are compiling and there is a issue with the code.

vtjnash commented 8 years ago

Closing since this is by-design and fixed by resolving the race-condition.

andreasnoack commented 8 years ago

Combined with our assignment rules in if/else blocks, e.g.

if use_threads
    @threads for i = 1:nthreads()
        Xlocal = copy(X) # because I want a thread local copy of X
    end
else
    Xlocal = X # I don't need a copy when I don't use threads but suddenly `Xlocal` in the other branch is not local anymore.
end

I think the current behavior of @threads will cause some confusion for Julia users that are not very familiar with scoping rules. If that is a group we care about, I think we should reopen this issue.

jrevels commented 8 years ago

Even if this behavior doesn't change, this issue was closed despite presenting a still uncompleted action item - that's why I left it open in the first place. I'm assuming that nobody would argue against documenting this behavior explicitly (if this is the behavior we're keeping)? Since that documentation still doesn't exist, I'm going to reopen this.

kpamnany commented 8 years ago

@andreasnoack: the comment in your code snippet is truncated, so I'm not sure what the confusion is?

andreasnoack commented 8 years ago

@kpamnany Sorry. Have updated the comment. The problem is that the use of Xlocal in the second branch changes the behavior of Xlocal in the first branch.

kpamnany commented 8 years ago

OpenMP deals with this by supporting private and shared clauses to indicate which of the variables that are already in scope should have per-thread copies and which should not. But these are the sort of fiddly details that hamper ease of use, and I don't think we should add controls for every such thing.

I believe that once some documentation is in, this will not be very confusing. We can certainly leave the issue open to remind us (me I guess) to write this up.

yuyichao commented 8 years ago

We have local, but the main difference between us and C/C++ is that we don't require explicit variable declaration so the scope of a variable is less clear for the reader.

andreasnoack commented 8 years ago

In my applications, it seems as if local variable are more common, e.g. the code that made me post here would require ten locals. The solution I ended up with wrapped all of the threaded code in a function like

if use_threads
    fooThreads!(a,b,c,d,e)
else
    fooSeq!(a,b,c,d,e)
end

with

function fooThreads!
    @threads for i in 1:nthreads()
        ...
    end
end

I've ended up with a similar pattern in other use cases of @threads where it helped performance so I'm wondering if the behavior of the @threads for is the most useful possible at the moment.

kpamnany commented 8 years ago

I'd try to not have anything like use_threads -- when you run serially, nthreads() is 1, so the code should just work as it does in the multithreaded case. Which is the whole point of serial equivalence. Is that not a viable approach?

andreasnoack commented 8 years ago

In the final version, it should be fine to control threading with JULIA_NUM_THREADS only but here I made the branch to have an easy way to investigate how @threads affected the results. It is then ironic that the threaded code only had problems because I added a branch to check if it was correct.

NHDaly commented 4 years ago

I guess this is more or less closed by #33119 (and #30896 is basically a duplicate of this issue), but this was left open because we lack documentation explaining how to avoid this problem. I guess now we should add documentation explaining how to use $-interpolation in @async and @spawn to avoid this variable lifting.

In fact, it looks like @spawn still isn't mentioned at all in the https://docs.julialang.org/en/v1.4-dev/manual/parallel-computing/ page, so there's probably a lot of work to be done on documentation..

dandanua commented 9 months ago

I think this issue has nothing to do with threads. The confusion comes from the scoping rules - the assignment to a variable after the cycle makes this variable non-local in the cycle. So, it's about ordering. And from a newbie perspective this is not intuitive. This is also inconsistent with the interactive global scoping rules (for an obvious reason - we can't know the future global scope).

I've edited the second part of this comment since I found this motivation for the scoping rule. This is sensible indeed. BTW, that comment explains the rule more clearly than docs. I think docs should emphasize on this more since Julia is different from many other languages in this regard.

eschnett commented 9 months ago

It would make sense to emit a compiler warning if a variable leaves a scope only because of how it is used later in the same function. That could be extended by a simple mechanism to declare local variables further up in a function.

Stefan Karpinski's argument (semantics should be independent of order) implies that a variables' scope begins before it is used the first time. This is counter-intuitive inside a function where statements are always executed in order, and it catches people by surprise.

JuliaLang / julia

Race condition caused by variable scope getting lifted from a multithreaded context #14948