ClangBuiltLinux / linux

Linux kernel source tree
Other
241 stars 14 forks source link

ppc44x_defconfig __umoddi3 with '-mcpu=440' #1679

Open nathanchance opened 2 years ago

nathanchance commented 2 years ago

After commit 0d913358a816 ("powerpc/44x: Fix build failure with GCC 12 (unrecognized opcode: `wrteei')") in -next, I see the following error when building ppc44x_defconfig:

$ make -skj"$(nproc)" ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu- LLVM=1 LLVM_IAS=0 O=build mrproper ppc44x_defconfig all
ld.lld: error: undefined symbol: __umoddi3
>>> referenced by mpage.c
>>>               mpage.o:(do_mpage_readpage) in archive fs/built-in.a
...

The only change that commit has for clang is the shift from -mcpu=powerpc to -mcpu=440.

cvise spits out:

int do_mpage_readpage_args_1_1, do_mpage_readpage_blocks_per_page,
    do_mpage_readpage_page_block, do_mpage_readpage_relative_block;
long long do_mpage_readpage_block_in_file;
void do_mpage_readpage() {
  for (;;) {
    if (do_mpage_readpage_args_1_1)
      break;
    if (do_mpage_readpage_page_block == do_mpage_readpage_blocks_per_page)
      break;
    do_mpage_readpage_page_block++;
    do_mpage_readpage_block_in_file++;
  }
  do_mpage_readpage_relative_block = do_mpage_readpage_block_in_file;
}
$ clang --target=powerpc-linux-gnu -mcpu=powerpc -O2 -c -o mpage.{o,i}

$ llvm-objdump -dr mpage.o

mpage.o:    file format elf32-powerpc

Disassembly of section .text:

00000000 <.text>:
       0: 00 00 00 00   <unknown>
            00000000:  R_PPC_REL32  .got2+0x7fd4

00000004 <do_mpage_readpage>:
       4: 7c 08 02 a6   mflr 0
       8: 90 01 00 04   stw 0, 4(1)
       c: 94 21 ff d0   stwu 1, -48(1)
      10: 93 c1 00 28   stw 30, 40(1)
      14: 93 21 00 14   stw 25, 20(1)
      18: 93 41 00 18   stw 26, 24(1)
      1c: 93 61 00 1c   stw 27, 28(1)
      20: 93 81 00 20   stw 28, 32(1)
      24: 93 a1 00 24   stw 29, 36(1)
      28: 48 00 00 05   bl 0x2c <do_mpage_readpage+0x28>
      2c: 7f c8 02 a6   mflr 30
      30: 80 7e ff d4   lwz 3, -44(30)
      34: 7f c3 f2 14   add 30, 3, 30
      38: 80 7e 80 00   lwz 3, -32768(30)
      3c: 80 83 00 00   lwz 4, 0(3)
      40: 80 7e 80 04   lwz 3, -32764(30)
      44: 28 04 00 00   cmplwi  4, 0
      48: 80 e3 00 04   lwz 7, 4(3)
      4c: 40 82 00 e8   bf  2, 0x134 <do_mpage_readpage+0x130>
      50: 80 9e 80 08   lwz 4, -32760(30)
      54: 80 be 80 0c   lwz 5, -32756(30)
      58: 81 24 00 00   lwz 9, 0(4)
      5c: 80 c5 00 00   lwz 6, 0(5)
      60: 7c 09 30 40   cmplw   9, 6
      64: 41 82 00 d0   bt  2, 0x134 <do_mpage_readpage+0x130>
      68: 81 03 00 00   lwz 8, 0(3)
      6c: 7d 23 48 f8   not 3, 9
      70: 7c 66 1a 15   add. 3, 6, 3
      74: 3b a0 00 00   li 29, 0
      78: 31 43 00 01   addic 10, 3, 1
      7c: 7d 7d 01 94   addze 11, 29
      80: 41 82 00 78   bt  2, 0xf8 <do_mpage_readpage+0xf4>
      84: 55 4c 00 3c   rlwinm 12, 10, 0, 0, 30
      88: 7c 67 60 14   addc 3, 7, 12
      8c: 7c 88 59 14   adde 4, 8, 11
      90: 33 8c ff fe   addic 28, 12, -2
      94: 7c a9 62 14   add 5, 9, 12
      98: 7c 0b 01 d4   addme 0, 11
      9c: 3b 60 00 00   li 27, 0
      a0: 33 bd 00 02   addic 29, 29, 2
      a4: 7f 7b 01 94   addze 27, 27
      a8: 7f 7a 5a 78   xor 26, 27, 11
      ac: 7f b9 62 78   xor 25, 29, 12
      b0: 7f 3a d3 79   or. 26, 25, 26
      b4: 40 82 ff ec   bf  2, 0xa0 <do_mpage_readpage+0x9c>
      b8: 7c e7 e0 14   addc 7, 7, 28
      bc: 83 be 80 08   lwz 29, -32760(30)
      c0: 7d 4a 62 78   xor 10, 10, 12
      c4: 81 9e 80 04   lwz 12, -32764(30)
      c8: 7d 08 01 14   adde 8, 8, 0
      cc: 7d 29 e2 14   add 9, 9, 28
      d0: 7d 6b 5a 78   xor 11, 11, 11
      d4: 30 e7 00 02   addic 7, 7, 2
      d8: 39 29 00 02   addi 9, 9, 2
      dc: 7d 4a 5b 79   or. 10, 10, 11
      e0: 7d 08 01 94   addze 8, 8
      e4: 91 3d 00 00   stw 9, 0(29)
      e8: 90 ec 00 04   stw 7, 4(12)
      ec: 91 0c 00 00   stw 8, 0(12)
      f0: 40 82 00 14   bf  2, 0x104 <do_mpage_readpage+0x100>
      f4: 48 00 00 40   b 0x134 <do_mpage_readpage+0x130>
      f8: 7d 04 43 78   mr  4, 8
      fc: 7c e3 3b 78   mr  3, 7
     100: 7d 25 4b 78   mr  5, 9
     104: 7c c5 30 50   sub 6, 6, 5
     108: 7c c9 03 a6   mtctr 6
     10c: 30 63 00 01   addic 3, 3, 1
     110: 38 a5 00 01   addi 5, 5, 1
     114: 7c 84 01 94   addze 4, 4
     118: 42 00 ff f4   bdnz 0x10c <do_mpage_readpage+0x108>
     11c: 80 fe 80 04   lwz 7, -32764(30)
     120: 80 de 80 08   lwz 6, -32760(30)
     124: 90 87 00 00   stw 4, 0(7)
     128: 90 67 00 04   stw 3, 4(7)
     12c: 7c 67 1b 78   mr  7, 3
     130: 90 a6 00 00   stw 5, 0(6)
     134: 80 7e 80 10   lwz 3, -32752(30)
     138: 83 a1 00 24   lwz 29, 36(1)
     13c: 83 81 00 20   lwz 28, 32(1)
     140: 90 e3 00 00   stw 7, 0(3)
     144: 83 61 00 1c   lwz 27, 28(1)
     148: 83 41 00 18   lwz 26, 24(1)
     14c: 83 21 00 14   lwz 25, 20(1)
     150: 80 01 00 34   lwz 0, 52(1)
     154: 83 c1 00 28   lwz 30, 40(1)
     158: 38 21 00 30   addi 1, 1, 48
     15c: 7c 08 03 a6   mtlr 0
     160: 4e 80 00 20   blr
$ clang --target=powerpc-linux-gnu -mcpu=440 -O2 -c -o mpage.{o,i}

$ llvm-objdump -dr mpage.o

mpage.o:    file format elf32-powerpc

Disassembly of section .text:

00000000 <.text>:
       0: 00 00 00 00   <unknown>
            00000000:  R_PPC_REL32  .got2+0x7fb4

00000004 <do_mpage_readpage>:
       4: 7c 08 02 a6   mflr 0
       8: 90 01 00 04   stw 0, 4(1)
       c: 94 21 ff b0   stwu 1, -80(1)
      10: 93 c1 00 48   stw 30, 72(1)
      14: 92 21 00 14   stw 17, 20(1)
      18: 92 41 00 18   stw 18, 24(1)
      1c: 92 61 00 1c   stw 19, 28(1)
      20: 92 81 00 20   stw 20, 32(1)
      24: 92 a1 00 24   stw 21, 36(1)
      28: 92 c1 00 28   stw 22, 40(1)
      2c: 92 e1 00 2c   stw 23, 44(1)
      30: 93 01 00 30   stw 24, 48(1)
      34: 93 21 00 34   stw 25, 52(1)
      38: 93 41 00 38   stw 26, 56(1)
      3c: 93 61 00 3c   stw 27, 60(1)
      40: 93 81 00 40   stw 28, 64(1)
      44: 93 a1 00 44   stw 29, 68(1)
      48: 48 00 00 05   bl 0x4c <do_mpage_readpage+0x48>
      4c: 7f c8 02 a6   mflr 30
      50: 80 7e ff b4   lwz 3, -76(30)
      54: 7f c3 f2 14   add 30, 3, 30
      58: 80 9e 80 00   lwz 4, -32768(30)
      5c: 80 7e 80 04   lwz 3, -32764(30)
      60: 80 84 00 00   lwz 4, 0(4)
      64: 83 23 00 04   lwz 25, 4(3)
      68: 28 04 00 00   cmplwi  4, 0
      6c: 40 82 01 24   bf  2, 0x190 <do_mpage_readpage+0x18c>
      70: 80 9e 80 08   lwz 4, -32760(30)
      74: 80 be 80 0c   lwz 5, -32756(30)
      78: 82 c4 00 00   lwz 22, 0(4)
      7c: 83 05 00 00   lwz 24, 0(5)
      80: 7c 16 c0 40   cmplw   22, 24
      84: 41 82 01 0c   bt  2, 0x190 <do_mpage_readpage+0x18c>
      88: 7e c4 b0 f8   not 4, 22
      8c: 82 a3 00 00   lwz 21, 0(3)
      90: 7c 98 22 14   add 4, 24, 4
      94: 28 04 00 04   cmplwi  4, 4
      98: 40 80 00 14   bf  0, 0xac <do_mpage_readpage+0xa8>
      9c: 7e ba ab 78   mr  26, 21
      a0: 7f 3b cb 78   mr  27, 25
      a4: 7e d7 b3 78   mr  23, 22
      a8: 48 00 00 b8   b 0x160 <do_mpage_readpage+0x15c>
      ac: 3a 80 00 00   li 20, 0
      b0: 33 64 00 01   addic 27, 4, 1
      b4: 38 a0 00 00   li 5, 0
      b8: 38 c0 00 05   li 6, 5
      bc: 7f 54 01 94   addze 26, 20
      c0: 7f 64 db 78   mr  4, 27
      c4: 3a 20 00 05   li 17, 5
      c8: 7f 43 d3 78   mr  3, 26
      cc: 48 00 00 01   bl 0xcc <do_mpage_readpage+0xc8>
            000000cc:  R_PPC_PLTREL24   __umoddi3
      d0: 7e 64 d8 10   subc    19, 27, 4
      d4: 7c 9c 23 78   mr  28, 4
      d8: 7c 7d 1b 78   mr  29, 3
      dc: 38 a0 00 00   li 5, 0
      e0: 7e 43 d1 10   subfe 18, 3, 26
      e4: 38 c0 00 05   li 6, 5
      e8: 7f 79 98 14   addc 27, 25, 19
      ec: 7e f6 9a 14   add 23, 22, 19
      f0: 7f 55 91 14   adde 26, 21, 18
      f4: 30 93 ff fb   addic 4, 19, -5
      f8: 7c 72 01 d4   addme 3, 18
      fc: 48 00 00 01   bl 0xfc <do_mpage_readpage+0xf8>
            000000fc:  R_PPC_PLTREL24   __udivdi3
     100: 7c c4 88 16   mulhwu 6, 4, 17
     104: 38 a0 00 00   li 5, 0
     108: 1c 63 00 05   mulli 3, 3, 5
     10c: 1c 84 00 05   mulli 4, 4, 5
     110: 7c 66 1a 14   add 3, 6, 3
     114: 30 a5 00 05   addic 5, 5, 5
     118: 7e 94 01 94   addze 20, 20
     11c: 7c a7 9a 78   xor 7, 5, 19
     120: 7e 86 92 78   xor 6, 20, 18
     124: 7c e6 33 79   or. 6, 7, 6
     128: 40 82 ff ec   bf  2, 0x114 <do_mpage_readpage+0x110>
     12c: 7c d6 22 14   add 6, 22, 4
     130: 7c 99 20 14   addc 4, 25, 4
     134: 80 be 80 08   lwz 5, -32760(30)
     138: 80 fe 80 04   lwz 7, -32764(30)
     13c: 7c 75 19 14   adde 3, 21, 3
     140: 38 c6 00 05   addi 6, 6, 5
     144: 33 24 00 05   addic 25, 4, 5
     148: 7f 84 eb 79   or. 4, 28, 29
     14c: 7c 63 01 94   addze 3, 3
     150: 90 c5 00 00   stw 6, 0(5)
     154: 93 27 00 04   stw 25, 4(7)
     158: 90 67 00 00   stw 3, 0(7)
     15c: 41 82 00 34   bt  2, 0x190 <do_mpage_readpage+0x18c>
     160: 7c 77 c0 50   sub 3, 24, 23
     164: 7c 69 03 a6   mtctr 3
     168: 33 7b 00 01   addic 27, 27, 1
     16c: 3a f7 00 01   addi 23, 23, 1
     170: 7f 5a 01 94   addze 26, 26
     174: 42 00 ff f4   bdnz 0x168 <do_mpage_readpage+0x164>
     178: 80 7e 80 08   lwz 3, -32760(30)
     17c: 80 9e 80 04   lwz 4, -32764(30)
     180: 7f 79 db 78   mr  25, 27
     184: 92 e3 00 00   stw 23, 0(3)
     188: 93 44 00 00   stw 26, 0(4)
     18c: 93 64 00 04   stw 27, 4(4)
     190: 80 7e 80 10   lwz 3, -32752(30)
     194: 83 a1 00 44   lwz 29, 68(1)
     198: 83 81 00 40   lwz 28, 64(1)
     19c: 83 61 00 3c   lwz 27, 60(1)
     1a0: 93 23 00 00   stw 25, 0(3)
     1a4: 83 41 00 38   lwz 26, 56(1)
     1a8: 83 21 00 34   lwz 25, 52(1)
     1ac: 83 01 00 30   lwz 24, 48(1)
     1b0: 82 e1 00 2c   lwz 23, 44(1)
     1b4: 82 c1 00 28   lwz 22, 40(1)
     1b8: 82 a1 00 24   lwz 21, 36(1)
     1bc: 82 81 00 20   lwz 20, 32(1)
     1c0: 82 61 00 1c   lwz 19, 28(1)
     1c4: 82 41 00 18   lwz 18, 24(1)
     1c8: 82 21 00 14   lwz 17, 20(1)
     1cc: 80 01 00 54   lwz 0, 84(1)
     1d0: 83 c1 00 48   lwz 30, 72(1)
     1d4: 38 21 00 50   addi 1, 1, 80
     1d8: 7c 08 03 a6   mtlr 0
     1dc: 4e 80 00 20   blr
nickdesaulniers commented 2 years ago

Guessing this is SCEV getting more powerful again; libcalls to __umoddi3 followed by __udivdi3 which looks a bit like a mod+div similar to the https://github.com/ClangBuiltLinux/linux/issues/1666. do_mpage_readpage_block_in_file is a 64b long long...

nathanchance commented 2 years ago

I thought so too but this is reproducible with clang-13 at least:

https://github.com/ClangBuiltLinux/continuous-integration2/runs/7543585026?check_suite_focus=true#step:5:141

nickdesaulniers commented 2 years ago

Guessing this is SCEV getting more powerful again; libcalls to umoddi3 followed by udivdi3 which looks a bit like a mod+div similar to the https://github.com/ClangBuiltLinux/linux/issues/1666.

%n.mod.vf = urem i64 %5, 5
...
  %7 = udiv i64 %6, 5

is being lowered to:

$r3 = COPY %44:gprc
  $r4 = COPY %43:gprc
  $r5 = COPY %42:gprc
  $r6 = COPY %45:gprc
  BL target-flags(ppc-plt) &__umoddi3, <regmask $cr2 $cr3 $cr4 $f14 $f15 $f16 $f17 $f18 $f19 $f20 $f21 $f22 $f23 $f24 $f25 $f26 $f27 $f28 $f29 $f30 $f31 $r14 $r15 $r16 $r17 $r18 $r19 $r20 $r21 $r22 $r23 $r24 $r25 and 18 more...>, implicit-def dead $lr, implicit $rm, implicit $r3, implicit $r4, implicit $r5, implicit $r6, implicit-def $r1, implicit-def $r3, implicit-def $r4
  ADJCALLSTACKUP 8, 0, implicit-def dead $r1, implicit $r1
nickdesaulniers commented 2 years ago

Additional report

nickdesaulniers commented 2 years ago

Fixed by:

  1. https://reviews.llvm.org/D131442
  2. https://reviews.llvm.org/D130862 (cc @topperc )
nickdesaulniers commented 1 year ago

The llvm patches have all landed. What do we need to do on the kernel side? Sounds like this is still an issue for clang-13 through clang-15? I think those llvm commits are too risky to ship in clang-15.

nathanchance commented 1 year ago

Just say that ppc44x_defconfig is broken with earlier versions of clang? I am not really sure how we can work around this in the kernel sources other than selectively reverting 2255411d1d0f0661d1e5acd5f6edf4e6652a345a for earlier versions of clang?

topperc commented 1 year ago

@nickdesaulniers The llvm patch for even divisors like 12 was broken and got reverted. I haven't re-committed it yet.

nathanchance commented 1 year ago

This appears resolved for me with both latest tip of tree and 16.0.0-rc4. Unfortunately, the resulting kernel does not boot in QEMU but that seems like a tangential issue to this one. I've filed https://github.com/ClangBuiltLinux/linux/issues/1814 for that, which has a workaround that would help us avoid this issue.