Experimental: Better inline asm

agnat commented 9 years ago

This is work in progress.

I started looking into inline assembly. I'm currently stuck at this test case from CodeGen/AVR/inline-asm.ll:

; RUN: llc < %s -march=avr | FileCheck %s

; CHECK-LABEL: multibyte_i32
define void @multibyte_i32(i32 %a) {
entry:
; CHECK: instr r22 r23 r24 r25
  call void asm sideeffect "instr ${0:A} ${0:B} ${0:C} ${0:D}", "r"(i32 %a)
; CHECK: instr r25 r24 r23 r22
  call void asm sideeffect "instr ${0:D} ${0:C} ${0:B} ${0:A}", "r"(i32 %a)
  ret void
}

Seems like handling of i32 values is missing. I've seen a few comments along the line "We only support i8 and i16". Why is that? Could you elaborate?

dylanmckay commented 9 years ago

I'm not really sure what you mean by "i32 support is missing?" - it is missing from what? Inline assembly? Instruction selection?

agnat commented 9 years ago

[...] it is missing from what?

Hehe... that's exactly what I'm trying to find out. The case at hand is related to inline assembly. But I believe I've seen comments and asserts to the same effect more than once. Other parts obviously do support larger types (artithmetic &c.). I guess my question is: Is it "not supported yet" or is it "here be dragons – not supported".

Since you weren't like "We decided to ignore >i16 because...", I guess it's just not implemented, right?

BTW, did you look at the patch? What do you think? Not worth the effort? To much mumbo-jumbo? I'm not sure yet...

dylanmckay commented 9 years ago

The patch looks like a simplification of the code to make it less verbose - that is always a win.

I have a feeling that the case you mention isn't suggesting that i32 support should be added - it is simply saying that these are the only possible options (there are no AVR instructions which take >16bit operands). There are only 8 and 16 bit register classes, so the function at hand couldn't use any other value types.

The only other place in the code that mentions "missing 32/16 bit support" that I can think of is AVRInstrInfo.td. In this case, it means that we are missing 16-bit (pseudo) variants of some instructions, so we cannot perform ISel on them, leading to instruction selection errors (pattern not resolved), or we miss an optimisation.

agnat commented 9 years ago

There are only 8 and 16 bit register classes [...]

I think that is part of the problem. Looking at the test above and looking at what ARM does, it seems we need a quad pseudo register class. Let me look into it...

dylanmckay commented 9 years ago

If we need to define 16, 32, (and possibly 64) bit ISel patterns, we really need to come up with a way to clean up AVRExpandPseudoInsts.cpp, as it is already at 1000+ lines of code, which is mostly repeated code.

agnat commented 9 years ago

I don't think we have to define additional patterns. Maybe load and store... at least not for the inline assembler.

I think we're on the same page. I don't want to add more pseudos let alone a whole set of them. I'm always looking for opportunities to offload work to ... existing infrastructure. ;)

Regarding AVRExpandPseudoInsts.cpp, I noticed the pattern. Pretty sure we can DRY that up. However, I think we should do #107 first... and yes that is a lot of work as it is.

dylanmckay commented 9 years ago

I am not much help when it comes to pseudo instructions, but only because I don't understand the live variables (I think) algorithm. Registers are marked as killed or dead in AVRExpandPseudoInsts.cpp, and I don't really know what's going on.

I think #107 is a good idea to do first as well - we will undoubtedly find lots of small, easy to fix bugs in the process.

agnat commented 9 years ago

lol ... same here. Seems like quite a massacre.

But: We're going to find out. ;)

Fortunately, we don't have to grok the whole register allocation algorithm. We just have to learn how to properly use it's API.

I think the main issue with AVRExpandPseudoInsts.cpp is: It does everything by index, which seems very error prone. I think I've read something about "named operand access", which sounds promising in this regard. Another thing to look into, though...

agnat commented 9 years ago

So...

This adds sub register access for i32 values, as in:

  call void asm sideeffect "instr ${0:A} ${0:B} ${0:C} ${0:D}", "r"(i32 %a)

However, I believe there must be a better way to achieve the same. I found this comment in ARMDAGToDAGISel::SelectInlineAsm:

Normally, i64 data is bounded to two arbitrary GRPs for "%r" constraint. However, some instrstions (e.g. ldrexd/strexd in ARM mode) require (even/even+1) GPRs and use %n and %Hn to refer to the individual regs respectively. Since there is no constraint to explicitly specify a reg pair, we use GPRPair reg class for "%r" for 64-bit data. For Thumb, the 64-bit data may be referred by H, Q, R modifiers, so we still pack them into a GPRPair.

We want what they describe as normality: Four arbitrary GPRs containing our i32 (I think it's called "expansion").

Also, note that the current implementation has bugs around here... and there. I'll probably let it rest for a while and revisit it when I have a better understanding of the SelectionDAG.

Still experimental. Do not merge. ;)

agnat commented 9 years ago

OT: I somehow have the feeling AVRExpandPseudoInsts.cpp is not the way to go. It feels all wrong. Specially those instructions that just perform the same op on the low and the high part raise an eyebrow. I think non of this should be necessary.

avr-llvm / llvm

Experimental: Better inline asm #126