Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

clang 2.9, MSYS: compilation of a simple 'hello world" code takes 10s #10000

Open Quuxplusone opened 13 years ago

Quuxplusone commented 13 years ago
Bugzilla Link PR9667
Status NEW
Importance P normal
Reported by Vincent Torri (vincent.torri@gmail.com)
Reported on 2011-04-10 05:30:44 -0700
Last modified on 2011-11-16 06:02:27 -0800
Version unspecified
Hardware PC Windows XP
CC efriedma@quicinc.com, geek4civic@gmail.com, jiri.spitz@bluetone.cz, llvm-bugs@lists.llvm.org
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also
I compiled llvm and clang on Windows, using MSYS. Here are the steps I did:

i unpacked llvm tarball
then i unpacked clang tarball in llvm/tools/
i renamed llvm/tools/clang-2.9 to llvm/tools/clang
i created build/ in llvm/
i entered llvm/build
then ../configure --prefix=/usr/llvm --enable-jit --with-built-clang

I think that --with-built-clang is useless.

Here are the settings:

$ llc --version
Low Level Virtual Machine (http://llvm.org/):
  llvm version 2.9
  Optimized build.
  Built Apr  8 2011 (09:07:34).
  Host: i686-pc-mingw32
  Host CPU: core2

  Registered Targets:
    alpha   - Alpha [experimental]
    arm     - ARM
    bfin    - Analog Devices Blackfin [experimental]
    c       - C backend
    cellspu - STI CBEA Cell SPU [experimental]
    cpp     - C++ backend
    mblaze  - MBlaze
    mips    - Mips
    mipsel  - Mipsel
    msp430  - MSP430 [experimental]
    ppc32   - PowerPC 32
    ppc64   - PowerPC 64
    ptx     - PTX
    sparc   - Sparc
    sparcv9 - Sparc V9
    systemz - SystemZ
    thumb   - Thumb
    x86     - 32-bit X86: Pentium-Pro and above
    x86-64  - 64-bit X86: EM64T and AMD64
    xcore   - XCore

$ clang --version
clang version 2.9 (tags/RELEASE_29/final)
Target: i686-pc-mingw32
Thread model: posix

If I compile a simple hello world code, it takes around 10s with clang, and
around 2s with gcc. Even clang --version  takes a bit more than 5s to display
informations.

I can provide more details if needed
Quuxplusone commented 13 years ago

Have you tried "configure --enable-optimized" or "make ENABLE_OPTIMIZED=1"?

Quuxplusone commented 13 years ago

configure help says that it is enabled by default, and in my original post, llc --version says "Optimized build."

Quuxplusone commented 13 years ago
(In reply to comment #2)
> configure help says that it is enabled by default, and in my original post,
llc
> --version says "Optimized build."

Ah, yes, enabled by default on 2.9-release, excuse me.
Quuxplusone commented 13 years ago

Hmm... what's the output if you use "clang++ -ftime-report"?

All of your numbers sound very high; for comparison, compiling a "Hello World" using in C++ on my machine takes roughly .2 seconds with either gcc or clang.

Quuxplusone commented 13 years ago
$ clang -ftime-report -o hello hello.c
===-------------------------------------------------------------------------===
                      Instruction Selection and Scheduling
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0000 seconds (0.0000 wall clock)

   ---Wall Time---  --- Name ---
        -----       Vector Legalization
        -----       Type Legalization
        -----       Instruction Selection
        -----       Instruction Scheduling Cleanup
        -----       Instruction Scheduling
        -----       Instruction Creation
        -----       DAG Legalization
        -----       DAG Combining 2
        -----       DAG Combining 1
        -----       Total

===-------------------------------------------------------------------------===
                                 DWARF Emission
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0000 seconds (0.0000 wall clock)

   ---Wall Time---  --- Name ---
        -----       DWARF Debug Writer
        -----       Total

===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0100 seconds (0.0100 wall clock)

   ---User Time---   --User+System--   ---Wall Time---  --- Name ---
   0.0100 (100.0%)   0.0100 (100.0%)   0.0100 (100.0%)  Expand ISel Pseudo-instructions
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  X86 PIC Global Base Reg Initialization
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  X86 Maximal Stack Alignment Check
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  X86 FP Stackifier
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  X86 DAG->DAG Instruction Selection
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  X86 AT&T-Style Assembly Printer
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Two-Address instruction pass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Subregister lowering instruction pass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Remove unreachable blocks from the CFG
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Remove unreachable blocks from the CFG
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Prolog/Epilog Insertion & Frame Finalization
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Machine Module Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Machine Function Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Lower invoke and unwind, for unwindless code generators
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Lower Garbage Collection Instructions
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Local Stack Slot Allocation
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Insert stack protectors
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Inliner for always_inline functions
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Fast Register Allocator
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Eliminate PHI nodes for register allocation
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Delete Garbage Collector Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Create Garbage Collector Module Metadata
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Bundle Machine CFG Edges
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Basic CallGraph Construction
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Basic Alias Analysis (stateless AA impl)
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Analyze Machine Code For Garbage Collection
   0.0100 (100.0%)   0.0100 (100.0%)   0.0100 (100.0%)  Total

===-------------------------------------------------------------------------===
                         Miscellaneous Ungrouped Timers
===-------------------------------------------------------------------------===

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.0100 ( 50.0%)   0.0200 (100.0%)   0.0300 ( 75.0%)   0.0601 ( 60.0%)  Clang front-end timer
   0.0100 ( 50.0%)   0.0000 (  0.0%)   0.0100 ( 25.0%)   0.0401 ( 40.0%)  Code Generation Time
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  LLVM IR Generation Time
   0.0200 (100.0%)   0.0200 (100.0%)   0.0401 (100.0%)   0.1001 (100.0%)  Total

I have executed the command several times, but i don't get numbers anymore, I
have "-----" instead

time gives that:

$ time clang -o hello hello.c

real    0m11.410s
user    0m0.010s
sys     0m0.050s
Quuxplusone commented 13 years ago

i forgot to mention that i run Windows XP in virtualbox. Anyway, the difference between gcc and clang is huge

Quuxplusone commented 13 years ago

Are you sure your VM isn't misconfigured somehow? The only thing I can think of is that clang is spending the time waiting on I/O.

Quuxplusone commented 13 years ago
It seems there is no culprit in clang and llvm passes.

I suspect storage or network showstopper would be there.

(IIRC, clang++ touches c:/mingw/include and [curdrv]:/mingw/include)

Q1) Please show us Drives mappings. (eg. mount)

Q2) Where is source path (the path configure is) and where is build path?

Q3) Please try "clang++ -v hello.c"

Q4) Please try "clang++ -integrated-as -v hello.c"

FYI, I have been building clang and llvm on mingw msys for over a half year, I
have not met the issue.
Quuxplusone commented 13 years ago
I have tried the binary in the llvm download page and there is no such slowdown
in the compilation time (for such small code like the hello world program, gcc
and llvm are about the same speed). There are differences in the link, though
(llvm is statically linked to clang in the build found in the download page)

I don't understand why my build is so slow. But if the devs think that this bug
can be closed, feel free to close it.

Just some thoughts:
 * would it be possible to split the mingw tarball into several part (at least a -bin and -dev part) ?
 * would it be possible to add a README in the mingw tarball that describes the command line, and maybe some informations on how it has been built ?

thank you
Quuxplusone commented 13 years ago

Oh! If you're doing a shared build, the time spent in the dynamic linker could explain the slowdown by itself. There's a reason the builds on the website aren't built that way. :)

Quuxplusone commented 12 years ago

I can confirm the same problem. Only "clang --version" takes about 5 s with a shared build. With a static build it takes even 9 s!

This problem is specific for MinGW. When using i686-mingw32-w32 or tdcc, this problem does not arise.

I started to observe this issue shortly before llvm 2.9 release.

Quuxplusone commented 12 years ago
clang on cygwin had been having the issue.

I am not sure, though, it could be resolved to link *static* libstdc++.a.

It can reduce startup time of clang.exe (a few secs -> instant)
And it resolved too slow testing on clang. (100 minutes -> 20 minutes)

HTH