Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

X86DisassemblerDecoder tblgen generated code is hugely bloated with a ton of relocations #11019

Open Quuxplusone opened 12 years ago

Quuxplusone commented 12 years ago
Bugzilla Link PR11953
Status NEW
Importance P enhancement
Reported by Chris Lattner (clattner@nondot.org)
Reported on 2012-02-08 18:57:54 -0800
Last modified on 2019-05-20 15:58:12 -0700
Version 1.0
Hardware PC All
CC benny.kra@gmail.com, craig.topper@gmail.com, jryans@gmail.com, llvm-bugs@lists.llvm.org, pageexec@gmail.com, rafael@espindo.la
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also

lib/Target/X86/Disassembler/Release/X86DisassemblerDecoder.o is by far the largest .o file generated in LLVM, weighing in at a whopping 1.9MB alone. The vast majority of this is static data (~1.2MB) but almost all of that (~950K) is mutable data that is getting relocated by the dynamic linker at startup time.

These tables really really need to be shrunken in general, and the dynamic relocations (i.e. pointers in one statically allocated global variable to another one) should be eliminated if possible to reduce startup time of apps that link them in.

-Chris

Quuxplusone commented 12 years ago

One good thing about this is that it is all tblgen generated, which means that a little work on tblgen can probably produce huge returns.

Quuxplusone commented 12 years ago

I'm taking a look at this. I'll at least have the relocations fixed soon.

Quuxplusone commented 12 years ago

Relocations removed in r150161.

I think I can trivially reduce some of the 256 entry sub-tables in the modrm table to 32 entries or less.

Quuxplusone commented 12 years ago

I've reduced the size further by adding an intermediate split between SPLITRM and FULL. This uses 16 entries per opcode instead of the 256 that full uses. Committed in r150167.

To really do better I think the indexing system needs to be fundamentally redesigned. Having all the ATTR tables is causing a lot of extra bloat. We should index by 1-byte, 2-byte, 3-byte opcode maps, then by opcode, then by attr.

Quuxplusone commented 12 years ago
Wow, that was a huge win!  From 1988020 bytes and:
    Section (__TEXT, __text): 8356
    Section (__TEXT, __cstring): 46982
    Section (__DATA, __const): 962128    (relocatable data)
    Section (__TEXT, __const): 234816     (read only data)
    Section (__TEXT, __eh_frame): 408

to 890820 bytes and:

    Section (__TEXT, __text): 8448
    Section (__TEXT, __cstring): 46982
    Section (__DATA, __const): 249424 (down over 700K!)
    Section (__TEXT, __const): 432656
    Section (__TEXT, __eh_frame): 408

You just carved off over a MB, and sped up the build of that .o file by a lot
too! Thank you!!

I tend to completely agree with you about redesigning the tables.  Your
approach makes a lot of sense.
Quuxplusone commented 12 years ago
With r150303 we're at
    Section (__TEXT, __text): 9276
    Section (__TEXT, __cstring): 826
    Section (__TEXT, __const): 646448
    Section (__LD, __compact_unwind): 160
    Section (__TEXT, __eh_frame): 408

relocatable data has relocated into text, cstring data is now shared with
libX86Desc.