Open Quuxplusone opened 10 years ago
Attached dlaed3_-fcf4b8.c
(15041 bytes, text/plain): preprocessed source
Attached dlaed3_-fcf4b8.sh
(412 bytes, text/plain): run script
Testcase reduces to just this:
a;
dlaed3_(double *q, double *dlamda, double *w) {
int b;
static c, j;
--dlamda;
-a;
for (;; ++j) {
b = j - 1;
for (; c <= b; ++c)
w[c] = q[j] - dlamda[j];
}
}
Strangely enough, it seems to be fixed by:
http://llvm.org/viewvc/llvm-project?view=revision&revision=205264
It also fixes a very similar-looking bug reported by a user of the FreeBSD
editors/libreoffice port here:
http://www.freebsd.org/cgi/query-pr.cgi?pr=187177
Hal, I've put you on CC since you are the author of that commit. Any idea if
the commit might be just hiding some other problem?
>
> Hal, I've put you on CC since you are the author of that commit. Any idea
> if the commit might be just hiding some other problem?
That commit did not fix anything, but did change some pass ordering. I'm fairly
certain that anything "fixed" by that commit is now just hidden. If you compile
with -fno-unroll-loops does the bug come back?
(In reply to comment #4)
> If you compile with -fno-unroll-loops does the bug come back?
Yep, with trunk r206915 and -fno-unroll-loops, it bombs again:
$ /share/dim/llvm/206915-trunk-freebsd11-i386-ninja-rel-1/bin/clang -cc1 -
triple i386-unknown-freebsd11.0 -emit-obj -disable-free -main-file-name pr19029-
reduced.c -mrelocation-model pic -pic-level 2 -mdisable-fp-elim -relaxed-
aliasing -masm-verbose -mconstructor-aliases -target-cpu athlon64 -O2 -ferror-
limit 19 -fmessage-length 191 -mstackrealign -fobjc-runtime=gnustep -
fdiagnostics-show-option -fcolor-diagnostics -vectorize-loops -vectorize-slp -
fno-unroll-loops -x c pr19029-reduced.c
pr19029-reduced.c:1:1: warning: type specifier missing, defaults to 'int' [-
Wimplicit-int]
a;
^
pr19029-reduced.c:2:1: warning: type specifier missing, defaults to 'int' [-
Wimplicit-int]
dlaed3_(double *q, double *dlamda, double *w) {
^~~~~~~
pr19029-reduced.c:4:10: warning: type specifier missing, defaults to 'int' [-
Wimplicit-int]
static c, j;
~~~~~~ ^
pr19029-reduced.c:4:13: warning: type specifier missing, defaults to 'int' [-
Wimplicit-int]
static c, j;
~~~~~~ ^
pr19029-reduced.c:6:3: warning: expression result unused [-Wunused-value]
-a;
^~
Instruction does not dominate all uses!
%27 = getelementptr inbounds double* %dlamda, i32 %6
%bound112 = icmp ule double* %27, %scevgep6
Instruction does not dominate all uses!
%27 = getelementptr inbounds double* %dlamda, i32 %6
%bound011 = icmp ule double* %scevgep, %27
Instruction does not dominate all uses!
%25 = getelementptr inbounds double* %q, i32 %3
%bound1 = icmp ule double* %25, %scevgep6
Instruction does not dominate all uses!
%25 = getelementptr inbounds double* %q, i32 %3
%bound0 = icmp ule double* %scevgep, %25
fatal error: error in backend: Broken function found, compilation aborted!
(In reply to comment #5)
...
> Yep, with trunk r206915 and -fno-unroll-loops, it bombs again:
By bisecting backwards, I found out this error seems to have been introduced
here:
http://llvm.org/viewvc/llvm-project?view=revision&revision=189858
"Enable late-vectorization by default. This patch changes the default setting
for the LateVectorization flag that controls where the loop-vectorizer is ran."
I guess the actual bug is yet another side-effect exposed by this change?
Nadav, since you authored r189858, I've put you on CC too, do you have any idea?
Attached pr19029-2.cpp
(726 bytes, application/octet-stream): More general testcase, reproduces with any target CPU
(In reply to comment #6)
> By bisecting backwards, I found out this error seems to have been introduced
> here:
>
> http://llvm.org/viewvc/llvm-project?view=revision&revision=189858
>
> "Enable late-vectorization by default. This patch changes the default
> setting for the LateVectorization flag that controls where the
> loop-vectorizer is ran."
So when forcing late vectorization on, using -mllvm -late-vectorize=true, I
searched backwards again, and now ended up at this previous revision (again by
nadav), which seems to introduce the crash:
http://llvm.org/viewvc/llvm-project?view=revision&revision=189539
"This patch moves the SLP-vectorizer and BB-vectorizer back into SCC passes"
I'm not sure if there is any option I can enable for earlier revisions, to
partially undo this, so I can figure out where the actual problem originates?
For completeness' sake, both testcases can be reproduced by using the following
flags:
clang -cc1 -triple x86_64-unknown-freebsd11.0 -emit-obj -O2 -vectorize-loops -
mllvm -late-vectorize=true
The actual triple does not matter too much, I also tried:
* i386-unknown-freebsd11.0
* i386-unknown-linux
* x86_64-unknown-linux
It looks like a bug in the loop-vectorizer. Can you reduce the test case to a bitcode file?
Attached pr19029-1.ll
(6016 bytes, application/octet-stream): .ll version of first reduced testcase
Attached pr19029-2.ll
(7985 bytes, application/octet-stream): .ll version of second reduced testcase
There is a clang flag for printing the IR before every transformation. I think that the generated LL file that you attached is already invalid. We need to catch it before it becomes invalid.
The flag appears to be -mllvm -print-before-all, but most of the 79
intermediate IR files don't seem to be complete, e.g. the very first one prints:
llvm-as: temp01.ll:12:41: error: use of undefined metadata '!0'
%5 = load double** %3, align 8, !tbaa !0
^
Others result in errors like:
llvm-as: temp24.ll:3:8: error: expected 'type' after '='
%5 = load i32* @dlaed3_.c, align 4, !tbaa !0
^
The pass numbers that do work without errors are:
08: *** IR Dump Before Interprocedural Sparse Conditional Constant Propagation
09: *** IR Dump Before Dead Argument Elimination
60: *** IR Dump Before Function Integration/Inlining ***printing a <null> value
61: *** IR Dump Before Deduce function attributes ***printing a <null> value
62: *** IR Dump Before A No-Op Barrier Pass
Then pass 75 ('Before Strip Unused Function Prototypes') dies with the
'Instruction does not dominate all uses!' error. The previous pass is 'Before
Simplify the CFG', but the produced IR is apparently not valid.
What was the last pass that finished successfully? You can manually place a breakpoint before that pass and dump the module.
(In reply to comment #19)
> How did you generate the files? If we're to isolate the bug, we need to be
> able to run the optimization pass so that it generates the bad output.
I couldn't get bugpoint to work (it tries to run /usr/bin/gcc, which does not
exist on my system... :), so I used -mllvm -print-before-all as a clang option,
e.g.:
clang -cc1 -triple x86_64-unknown-freebsd11.0 -emit-obj -O2 -vectorize-loops -
mllvm -late-vectorize=true -mllvm -print-before-all pr19029-1.c 2> irdumps.txt
This logs all the IR into irdumps.txt. I use the following python fragment to
split out the dumps in separate files:
#!/usr/bin/env python
irfile = open('irdumps.txt', 'r')
counter = 0
outfile = None
for line in irfile:
if line.startswith('*** IR Dump'):
counter += 1
if outfile:
outfile.close()
print 'Opening output file %d...' % counter
outfile = open('temp%02d.ll' % counter, 'w')
outfile.write('; %s' % line)
elif outfile:
outfile.write(line)
if outfile:
outfile.close()
Unfortunately, not each pass logs the full IR, for some reason, so not each
individual dump is useful at this time. Nadav suggested instead to run clang
in gdb and set a breakpoint on the pass manager, but I'm not sure how to dump
the current IR as a file from gdb...
Attached pr19029-1-ir.tar.gz
(9041 bytes, application/x-tar): Tarball with intermediate .ll files
Attached pr19029-1-FPPassManager-Loop_Vectorization-before.ll
(2958 bytes, application/octet-stream): IR of pr19029-1 just before FPPassManager's Loop Vectorization pass
Attached pr19029-1-FPPassManager-Loop_Vectorization-after.ll
(8618 bytes, application/octet-stream): IR of pr19029-1 after FPPassManager's Loop Vectorization pass
Note that LoopVectorize::runOnFunction() calls processLoop() only once. Before the call, the module is still OK, after the call it is broken.
Some more investigation shows that LoopVectorize::processLoop() calls InnerLoopVectorizer::vectorize(). This first calls InnerLoopVectorizer::createEmptyLoop(), after which the IR is already bad. This is not the case before the createEmptyLoop() call.
I'm not sure if the IR is supposed to be consistent throughout the InnerLoopVectorizer implementation, however...
Nadav, do you need any other .ll output? I think attachment 12445 is the last stage before the LoopVectorizer does something bad to the IR.
Ping :)
Ping 2 :)
Turns out this finally got fixed in https://reviews.llvm.org/rL229419 ("Run LICM as part of the cleanup phase from the scalar optimizer") by James Molloy.
dlaed3_-fcf4b8.sh
(412 bytes, text/plain)dlaed3_-fcf4b8.c
(15041 bytes, text/plain)pr19029-2.cpp
(726 bytes, application/octet-stream)pr19029-1.ll
(6016 bytes, application/octet-stream)pr19029-2.ll
(7985 bytes, application/octet-stream)pr19029-1-ir.tar.gz
(9041 bytes, application/x-tar)pr19029-1-FPPassManager-Loop_Vectorization-before.ll
(2958 bytes, application/octet-stream)pr19029-1-FPPassManager-Loop_Vectorization-after.ll
(8618 bytes, application/octet-stream)