JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.46k stars 5.46k forks source link

Inlining(?) Heisenbug in 0.4 and 0.5 #14405

Closed iamed2 closed 7 years ago

iamed2 commented 8 years ago

I discovered this when writing code for Advent of Code 14, so spoilers are contained in the Gist. I was unable to reproduce the issue using other code. I believe it has something to do with mutating SubArrays inside a function but I'm not sure.

Here is the Gist: https://gist.github.com/iamed2/5575061d8981dd4166fc

If you want to run the code, the directory structure is in 14.jl, just download the files and put them in their places.

Running the file on its own will produce the bad output. Running while uncommenting any of the commented lines will produce the good output. Running with --inline=no or --precompiled=no will produce the good output. Notice that one of the commented lines is code_lowered: apparently if I try to inspect the code at all, it causes the bug to disappear!

This happens the same on these two Julia versions:

Julia Version 0.4.2-pre+127
Commit 4f951cf (2015-12-06 21:09 UTC)
DEBUG build
Platform Info:
  System: Darwin (x86_64-apple-darwin15.2.0)
  CPU: Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.3
Julia Version 0.5.0-dev+1730
Commit a81101b (2015-12-09 16:42 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin15.0.0)
  CPU: Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT NO_AFFINITY HASWELL)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

This is a particularly nasty bug, so any help or suggestions are much appreciated.

vtjnash commented 8 years ago

i'm starting to suspect this is an llvm bug. Julia emits the same IR for fly! in either case. however, for the failing case, llvm has performed a loop unswitch (deleting the for loop over r.fly_time if <= 0 when entering the function). this seems valid, though, so perhaps something else has gone wrong after that? the fly! function makes no calls so there's nobody else to blame, but most of the writes to the array have failed to occur between the call and return of this function.

iamed2 commented 8 years ago

@omus has discovered that the bad results can be replicated by commenting out dists[time + 1] = dists[time], or by printin dists[time] before or after that line. So llvm must be rearranging that assignment such that it assigns in the wrong order.

iamed2 commented 8 years ago

In my Gallium Julia build running llvm 3.8 it appears to be fixed. Not sure if that proves anything as that branch is very different. I'll build master with 3.7 and see if that works.

Is LLVM going to be upgraded for 0.4 or is that only going to happen for 0.5?

iamed2 commented 8 years ago

Looks like master with 3.7 works. It's also surprisingly much faster! So that's nice.

KristofferC commented 7 years ago

Please comment if this shows up again.