Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

MC instruction info tables are hugely bloated #12081

Open Quuxplusone opened 12 years ago

Quuxplusone commented 12 years ago
Bugzilla Link PR11954
Status NEW
Importance P enhancement
Reported by Chris Lattner (clattner@nondot.org)
Reported on 2012-02-08 19:11:30 -0800
Last modified on 2012-02-16 20:53:48 -0800
Version 1.0
Hardware PC All
CC benny.kra@gmail.com, llvm-bugs@lists.llvm.org, pageexec@gmail.com, rafael@espindo.la, stoklund@2pi.dk
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also
Some of the largest .o files in LLVM are the
<target>/MCTargetDesc/Release/<target>MCTargetDesc.o files.  For example, X86's
is 680K and ARM's is 580K.  By far the biggest contributor to this bloat comes
from the "GET_INSTRINFO_MC_DESC" part of "X86GenInstrInfo.inc" (it is 570K of
X86's 680K), and almost all of the bloat is in the __DATA,__const section
because it has relocations in it (affecting startup time) for PIC builds.

The easiest improvement is because MCInstrDesc is also just poorly packed: it
has dead struct padding in it on 64-bit hosts.  Reordering the fields would
same quite a bit.

The relocations come from the references to other arrays (i.e. ImplicitList and
OperandInfo).  Instead of pointers to these arrays, it would be better to
flatten all the arrays into a single ImplicitList and then use indexes into it.
In pseudo code, instead of:

static const unsigned ImplicitList1[] = { X86::EFLAGS, 0 };
static const unsigned ImplicitList2[] = { X86::AX, 0 };
...
static const unsigned ImplicitList7[] = { X86::ESP, X86::EFLAGS, 0 };

...
  { .... ImplicitList1 ... },
  { .... ImplicitList2 ... },

it would be better to emit these as:

static const unsigned ImplicitList[] = { X86::EFLAGS, 0, X86::AX, 0, X86::AX, 0
};
...
  { .... 0 /*offset of "ImplicitList0" in ImplicitList*/ ... },
  { .... 2 ... },

This would eliminate the relocations and shrink the entry from 8 bytes to
something more reasonable (4 or even 2).  Further, this would allow us to go
crazy and unique together the tails of these lists.  For example, ImplicitList1
is the the same as "ImplicitList7+1" in the example above.

The "Name" field should be merged into a big string and use indexes into the
string, the same way we handle the instruction names in the generated
printInstruction() method of the target asmprinters.
Quuxplusone commented 12 years ago

Please pay attention to compile times when doing this.

Some of these entries are very frequently used, and it doesn't make sense to replace a one-time startup cost with a dynamic computation that is executed millions of times.

Quuxplusone commented 12 years ago
Speaking of runtime, we have some code that loops over MachineInstrs checking
whether an instruction is a PHI or a DBG_VALUE. It does this by loading the
opcode from the corresponding MCInstrDesc, that load is incredibly slow each
time we hit a new instruction because the table is way too large to fit into
any cache. The painful part is that we can calculate the opcode by

MCInstrDesc *Base, MCInstrDesc *Instr;
unsigned Opcode = Instr - Base;

without looking at the table at all, because they are stored sequentially . Of
course this requires us to store "Base" somewhere, but if we want to store the
name and operands as indices we'll need a base ptr for them too.
Quuxplusone commented 12 years ago

Ben, it might be worth it to sacrifice an MI flag: IsInvariantOpcode. It would be set for any of the opcodes in TargetOpcodes.h.

I would want to measure the speedup first, though.

Quuxplusone commented 12 years ago

It could be worth adding a bit for that, but it would also be nice to speed up all getOpcode() calls :)

Quuxplusone commented 12 years ago

I think it would be fine to make getName() a method on MCInstrInfo instead of MCInstrDesc. We already do the same thing for MCRegisterInfo::getName(). That would allow the names to be stored somewhere else. They are only used for debugging anyway.

For fast MI::getOpcode(), it seems you would need to store a base pointer inside the MI. That might be worth it, if it provides a significant speedup.

Quuxplusone commented 12 years ago
(In reply to comment #5)
> For fast MI::getOpcode(), it seems you would need to store a base pointer
> inside the MI. That might be worth it, if it provides a significant speedup.

Just adding the opcode to MI would be easier and it's only 2 bytes so we can
probably squeeze it into padding somewhere.
Quuxplusone commented 12 years ago

There is 16 bits after AsmPrinterFlags. It's worth a shot.

Please measure the performance impact.

Quuxplusone commented 12 years ago
(In reply to comment #7)
> There is 16 bits after AsmPrinterFlags. It's worth a shot.
>
> Please measure the performance impact.

A quick test showed that on the large test case with debug info I had in mind,
the actual impact is within the noise. It's not worth it to add additional
complexity and for this.

Anyways, this is a bit off topic for this bug report.
Quuxplusone commented 12 years ago
(In reply to comment #8)
> A quick test showed that on the large test case with debug info I had in mind,
> the actual impact is within the noise.

Turns out L2 cache works pretty well. Thanks anyway, Ben.
Quuxplusone commented 12 years ago
(In reply to comment #5)
> I think it would be fine to make getName() a method on MCInstrInfo instead of
> MCInstrDesc. We already do the same thing for MCRegisterInfo::getName(). That
> would allow the names to be stored somewhere else. They are only used for
> debugging anyway.

With r150245 the names are accessible via MCInstrInfo and allocated in an
indexed string table. There is another copy of that table in AsmWriter and the
x86 disassembler has its own copies of the strings to determine whether an
instruction is 16 bit(?). It would be nice to unify them eventually.