Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

x86-domain-reassignment pass causes dreadful compile time slowdown when generating code for skylake-avx512 architecture #41142

Open Quuxplusone opened 5 years ago

Quuxplusone commented 5 years ago
Bugzilla Link PR42172
Status NEW
Importance P normal
Reported by Kevin W. Harris (Kevin.Harris@unisys.com)
Reported on 2019-06-06 14:25:35 -0700
Last modified on 2019-06-06 14:36:56 -0700
Version trunk
Hardware PC Linux
CC craig.topper@gmail.com, Kevin.Harris@unisys.com, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, spatel+llvm@rotateright.com
Fixed by commit(s)
Attachments i131323820_f000203011542_1.bc (7372 bytes, application/octet-stream)
i141043225_f400004013147_1.bc (165952 bytes, application/octet-stream)
i140409520_f400004002276_1.bc (600328 bytes, application/octet-stream)
Blocks
Blocked by
See also
Created attachment 22078
the small case described above

At Unisys, we have an LLVM-based JIT to generate x86-64 code from instruction
sequences from one of our historical architectures.  We recently started
testing on servers with the skylake-avx512 architecture.  We encountered a
shocking compile time slowdown when compiling for this target architecture.
Using pass timings, we isolated the problems to the x86-domain-reassignment
pass, by disabling this pass and seeing a large compile-time speedup.  From the
scatter plot of 31K+ examples, it clearly shows that an n-squared algorithm
must be the culprit, since the slowdown is strongly related to the size of the
IR.  I provide three examples, a small one, a medium sized one, and a large one:

i131323820_f000203011542_1.bc - 712 object code bytes
i141043225_f400004013147_1.bc - 40376 object code bytes
i140409520_f400004002276_1.bc - 129424 object code bytes

The opt+llc pipeline that I ran for these cases showed the following slowdowns
when the x86-domain-reassignment pass is used:

small case: 0.013 secs to 0.017 secs
middle case: 1.043 secs to 7.981 secs
large case: 4.140 secs to 69.689 secs

The opt+llc pipeline that I used for these comparisons looks like this:

$LLVMPATH/opt -O3 -enable-tbaa -mcpu=skylake-avx512 $INFILE | $LLVMPATH/llc -O3
-enable-tbaa -filetype=obj -o=out.o -mcpu=skylake-avx512 -

where $INFILE is one of the 3 files listed above.  These commands generated the
slow times noted above.  The fast times were obtained by adding the -disable-
x86-domain-reassignment option to the llc command above.

I measured the object size for each of the 31K+ bitcode files that we used for
this experiment, and saw no changes to the object code size with/without the -
disable-x86-domain-reassignment option, so I'm presuming that this extra
compile time gives us no benefit.

Please let me know if I can provide any additional assistance in resolving this
problem.
Quuxplusone commented 5 years ago

Attached i131323820_f000203011542_1.bc (7372 bytes, application/octet-stream): the small case described above

Quuxplusone commented 5 years ago

Attached i141043225_f400004013147_1.bc (165952 bytes, application/octet-stream): the middle size case

Quuxplusone commented 5 years ago

Attached i140409520_f400004002276_1.bc (600328 bytes, application/octet-stream): the large size case

Quuxplusone commented 5 years ago
These experiments were run against LLVM 7.0.0, but I see no edits in the module
lib/Target/X86DomainReassignment.cpp to expect any behavior change since then,
so I posted it as a bug against the trunk.  I searched for AVX512 problems and
compile time problems in the bug list and didn't find any previous record of
this one.
     -Kevin