avr-llvm / llvm

[MERGED UPSTREAM] AVR backend for the LLVM compiler library
220 stars 21 forks source link

Division/remainder support #130

Closed dylanmckay closed 9 years ago

dylanmckay commented 9 years ago

This is a work-in-progress patch for adding division support (by calling runtime library functions).

AVR's libgcc does not implement division routines - it more or less only implements division and remainder functions.

This patch allows LegalizeDAG.cpp to expand a DIV or REM node into a DIVREM node, throwing away one of the results. It also adds a special calling convention for AVR runtime library functions.

The problem is then lowering the DIVREM node into a correct call of __udivmodqi4, __divmodsi4, etc.

I can find two different ways:

@agnat What are your thoughts?

agnat commented 9 years ago

IIUC, ARM has the exact same issue, right? Then I'd argue there is a pattern: Some targets libcalls have a custom CC and pass things in registers, notably DIVREM, notably the remainder.

ARM ended up using option 1, custom lowering. I'd probably

  1. duplicate that and make it work on our end
  2. look into pushing it down to LegalizeDAG, getting rid of both the ARM and the AVR customization

However, it's kind of hard to say without trying and tracing since I'm still lacking the big picture...

agnat commented 9 years ago

ARM ended up using option 1, custom lowering.

Correction:

ARM ended up using option two, obviously. Sorry about that.

dylanmckay commented 9 years ago

I have now made DIVREM preferred over DIV or REM nodes.

DIVREM is now custom expanded into the appropriate libcall.

Here is an example IR file:


define i8 @do_thing(i8 %a, i8 %b) {
    %result = udiv i8 %a, %b
    ret i8 %result
}

define i8 @black_box(i8 %a, i8 %b) {
    ret i8 %a
}

define i8 @main() {
    %1 = call i8 @do_thing(i8 2, i8 3)
    %2 = call i8 @black_box(i8 %1, i8 %1)
    ret i8 %2
}

Which, when compiled with LLVM and linked with GCC, gives this:

a.out:     file format elf32-avr

Disassembly of section .text:

00000000 <do_thing>:
   0:   0e 94 0d 00     call    0x1a    ; 0x1a <__divmodqi4>
   4:   89 2f           mov r24, r25
   6:   08 95           ret

00000008 <black_box>:
   8:   08 95           ret

0000000a <main>:
   a:   82 e0           ldi r24, 0x02   ; 2
   c:   63 e0           ldi r22, 0x03   ; 3
   e:   0e 94 00 00     call    0   ; 0x0 <do_thing>
  12:   68 2f           mov r22, r24
  14:   0e 94 04 00     call    0x8 ; 0x8 <black_box>
  18:   08 95           ret

0000001a <__divmodqi4>:
  1a:   87 fb           bst r24, 7
  1c:   08 2e           mov r0, r24
  1e:   06 26           eor r0, r22
  20:   87 fd           sbrc    r24, 7
  22:   81 95           neg r24
  24:   67 fd           sbrc    r22, 7
  26:   61 95           neg r22
  28:   05 d0           rcall   .+10        ; 0x34 <__udivmodqi4>
  2a:   0e f4           brtc    .+2         ; 0x2e <__divmodqi4_1>
  2c:   91 95           neg r25

0000002e <__divmodqi4_1>:
  2e:   07 fc           sbrc    r0, 7
  30:   81 95           neg r24

00000032 <__divmodqi4_exit>:
  32:   08 95           ret

00000034 <__udivmodqi4>:
  34:   99 1b           sub r25, r25
  36:   79 e0           ldi r23, 0x09   ; 9
  38:   04 c0           rjmp    .+8         ; 0x42 <__udivmodqi4_ep>

0000003a <__udivmodqi4_loop>:
  3a:   99 1f           adc r25, r25
  3c:   96 17           cp  r25, r22
  3e:   08 f0           brcs    .+2         ; 0x42 <__udivmodqi4_ep>
  40:   96 1b           sub r25, r22

00000042 <__udivmodqi4_ep>:
  42:   88 1f           adc r24, r24
  44:   7a 95           dec r23
  46:   c9 f7           brne    .-14        ; 0x3a <__udivmodqi4_loop>
  48:   80 95           com r24
  4a:   08 95           ret

So far I have only tested i8/i8 -> i8 division. I will add tests for the other kinds and then merge.

Looking at the machine code, this seems to follow the proper calling convention for __divmodqi4.

@agnat Could you review this? The commit history is a mess (a bunch of experimentation), but the 'Files changed' should help.

agnat commented 9 years ago

I finished my review. Considering your initial problem description this is a lot less involved than I feared.

Nice job.