Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

For some bitcodes it can take 12 hours to read and compile #46364

Open Quuxplusone opened 4 years ago

Quuxplusone commented 4 years ago
Bugzilla Link PR47395
Status CONFIRMED
Importance P enhancement
Reported by Scott Waye (scott.waye@hubse.com)
Reported on 2020-09-02 06:43:59 -0700
Last modified on 2020-09-03 05:37:21 -0700
Version trunk
Hardware PC Windows NT
CC dblaikie@gmail.com, efriedma@quicinc.com, llvm-bugs@lists.llvm.org, tlively@google.com, yuanfang.chen@sony.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also
I create bitcode using libLLVM for the corert compiler project
(https://github.com/dotnet/corert).  It uses the c# bindings over libLLVM from
https://github.com/Microsoft/LLVMSharp.

I have 2 bitcodes generated from mostly the same source code.  They are around
240MB in size.  One compiles in 3 minutes, the other in 12 hours.  I suspect
the 12 hour compilation is either not optimal or doing something wrong.  I use
emscripten to compile and this ultimately calls

E:/GitHub/llvm-project/build/release/bin/clang++.exe -target wasm32-unknown-
emscripten -D__EMSCRIPTEN_major__=1 -D__EMSCRIPTEN_minor__=39 -
D__EMSCRIPTEN_tiny__=19 -D_LIBCPP_ABI_VERSION=2 -Dunix -D__unix -D__unix__ -
Werror=implicit-function-declaration -Xclang -nostdsysteminc -Xclang -
isystemE:\GitHub\emsdk\upstream\emscripten\system\include\libcxx -Xclang -
isystemE:\GitHub\emsdk\upstream\emscripten\system\lib\libcxxabi\include -Xclang
-isystemE:\GitHub\emsdk\upstream\emscripten\system\lib\libunwind\include -
Xclang -isystemE:\GitHub\emsdk\upstream\emscripten\system\include\compat -
Xclang -isystemE:\GitHub\emsdk\upstream\emscripten\system\include -Xclang -
isystemE:\GitHub\emsdk\upstream\emscripten\system\include\libc -Xclang -
isystemE:\GitHub\emsdk\upstream\emscripten\system\lib\libc\musl\arch\emscripten
-Xclang -isystemE:\GitHub\emsdk\upstream\emscripten\system\local\include -
Xclang -isystemE:\GitHub\emsdk\upstream\emscripten\system\include\SSE -Xclang -
isystemE:\GitHub\emsdk\upstream\emscripten\cache\wasm\include -DEMSCRIPTEN -
fignore-exceptions -c -g
E:\GitHub\UnoCoreRt\UnoCoreRt.Wasm\bin\Debug\netstandard2.0\UnoCoreRt.Wasm.bc -
Xclang -isystemE:\GitHub\emsdk\upstream\emscripten\system\include\SDL -c -o
E:\GitHub\UnoCoreRt\UnoCoreRt.Wasm\bin\Debug\netstandard2.0\UnoCoreRt-release.o
-mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-sjlj -
mllvm -disable-lsr -g

What I've noticed is that compared to the "fast", 3 minute compile, the "slow"
compile makes around 1 million calls to https://github.com/llvm/llvm-
project/blob/a6eb70c052da767aef6b041d0db20bdf3a9e06b5/llvm/lib/Bitcode/Reader/ValueList.cpp#L89
and hence the ResolveConstants variable ends up with that many entries.
Resolving these constants is then what seems to take most of the time.  I think
the bitcode reader is identifying 1 million forward references so possible
causes of the slowness that come to mind are:

1.  Incorrect identification of forward references
2.  Incorrect writing from libLLVM that creates forward references unnecessarily
3.  Slow algorithm to resolve correctly identified and written forward
references.

A copy of the bitcode is at http://dev.hubse.com/UnoCoreRt.Wasm.bc.msi   (its
not really an msi, just needed a binary extension that the web server would
serve).  File is actually a .7z compressed file, so needs renaming from .msi to
.7z

I privately messaged @tlively in discord and I believe he has confirmed that it
takes a long time for him also.

https://bugs.llvm.org/show_bug.cgi?id=46750  looks to be the same area of code,
but not the same problem.

I did spend a bit of time with clang++  in the debugger, but I'm not that
familiar with it at all, so I couldn't make any conclusion about my 3 theories
above.
Quuxplusone commented 4 years ago

Thanks for the report! Yes, I was able to reproduce this problem. It would be great if someone more familiar with this part of the code could take a look, though :)

Quuxplusone commented 4 years ago

If it's taking hours to read a 240MB bitcode file, there must be some non-linear algorithm involved. Looking at BitcodeReaderValueList::resolveConstantForwardRefs , I can only come up with one potential source of non-linearity: calling replaceAllUsesWith() on every element of a large array could be O(N^2). (It looks like the original author realized this was a possibility, which is why it doesn't RAUW the placeholder itself. But the reasoning was applied incompletely.)

Does the testcase have an array with a lot of constant expression operands somewhere?


It's possible we could also change the way we write bitcode files to avoid forward references more aggressively; not sure how the ordering works off the top of my head.

Quuxplusone commented 4 years ago
Yes, there are quite a few e.g.

@__FrozenSegmentRegionStart = global [278553 x i32*] [i32* null, i32* bitcast
([18 x i32*]* @__EEType_String to i32*), i32* null, i32* null, i32* null

There's some even bigger ones, but vim is having a hard time with the length of
the lines, but there's at least one with 1.7 million combinations of
bitcast(i8* getelementptr....
Quuxplusone commented 4 years ago
(In reply to Scott Waye from comment #3)
> There's some even bigger ones, but vim is having a hard time with the length
> of the lines

The POSIX fold command should be able to handle arbitrarily long lines.

How are you dumping the IR, though?  If you're using llvm-dis, how long did it
take?
Quuxplusone commented 4 years ago
I'm using libLLVM at the point of finishing the compilation (of c# => LLVM IR)
with

#if DEBUG
            Module.PrintToFile(Path.ChangeExtension(_objectFilePath, ".txt"));
#endif //DEBUG

The PrintToFile dumps the textual representation and it completes in a normal
amount of time given that its 1.5GB.

This method goes to

        [DllImport(LibraryPath, CallingConvention = CallingConvention.Cdecl, EntryPoint = "LLVMPrintModuleToFile", ExactSpelling = true)]
        [return: NativeTypeName("LLVMBool")]
        public static extern int PrintModuleToFile([NativeTypeName("LLVMModuleRef")] LLVMOpaqueModule* M, [NativeTypeName("const char *")] sbyte* Filename, [NativeTypeName("char **")] sbyte** ErrorMessage);

i.e.  LLVMPrintModuleToFile in libLLVM
Quuxplusone commented 4 years ago

by "normal" I've not actually measured it, but I think its around 30 seconds.

Quuxplusone commented 4 years ago
here's the first few lines of that large array

@__embedded_metadata = global [1713228 x i32*] [i32* bitcast (i8* getelementptr
(i8, i8* null, i32 -559030275) to i32*), i32* bitcast (i8* getelementptr (i8, i8
* null, i32 1493321100) to i32*), i32* bitcast (i8* getelementptr (i8, i8* null,
 i32 621213955) to i32*), i32* bitcast (i8* getelementptr (i8, i8* null, i32 -92
1973496) to i32*), i32* bitcast (i8* getelementptr (i8, i8* null, i32 420509964)
 to i32*), i32* bitcast (i8* getelementptr (i8, i8* null, i32 1360507156) to i32
*), i32* bitcast (i8* getelementptr (i8, i8* null, i32 -920712933) to i32*), i32
* bitcast (i8* getelementptr (i8, i8* null, i32 1227260194) to i32*), i32* bitca
st (i8* getelementptr (i8, i8* null, i32 -1121234646) to i32*), i32* bitcast (i8
Quuxplusone commented 4 years ago

while I've got you, this array is basically just packing numbers. If there's a more efficient way to do that, over a bunch of geps, I'm all ears.

Quuxplusone commented 4 years ago

Probably you should be using inttoptr instead of gep on a null pointer.

I have a patch I think should fix this; will post soon.

Quuxplusone commented 4 years ago

https://reviews.llvm.org/D87063

Quuxplusone commented 4 years ago

Thanks! I can confirm that with this patch the compile time is back to 3 minutes.