Open Quuxplusone opened 6 years ago
Bugzilla Link | PR36243 |
Status | CONFIRMED |
Importance | P enhancement |
Reported by | Marcin Kościelnicki (koriakin@0x04.net) |
Reported on | 2018-02-05 14:34:12 -0800 |
Last modified on | 2019-12-27 06:59:01 -0800 |
Version | trunk |
Hardware | PC All |
CC | andyg1001@hotmail.co.uk, dave@znu.io, hfinkel@anl.gov, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, nemanja.i.ibm@gmail.com, qshanz@cn.ibm.com, spatel+llvm@rotateright.com |
Fixed by commit(s) | rG257acbf6aee9 |
Attachments | |
Blocks | |
Blocked by | |
See also | PR39464 |
I have also found this. Here is the code I am using:
#include <stdint.h>
struct UInt96
{
uint32_t d2, d1, d0;
};
void ADD96_v1(const UInt96& s, UInt96& d)
{
uint64_t sum = uint64_t(d.d0) + uint64_t(s.d0);
d.d0 = sum; sum >>= 32;
sum += uint64_t(d.d1) + uint64_t(s.d1);
d.d1 = sum; sum >>= 32;
sum += uint64_t(d.d2) + uint64_t(s.d2);
d.d2 = sum;
}
void ADD96_v2(const UInt96& s, UInt96& d)
{
uint32_t carry;
d.d0 = __builtin_addc(d.d0, s.d0, 0, &carry);
d.d1 = __builtin_addc(d.d1, s.d1, carry, &carry);
d.d2 = __builtin_addc(d.d2, s.d2, carry, &carry);
}
ADD96_v1 was my original function, and ADD96_v2 is my attempt to rewrite it
using __builtin_addc. While the second version works, I am surprised that it
is less optimal than the first version: using the Godbolt Compiler Explorer,
the generated ARM code contains an additional three seemingly unnecessary
instructions; worse the PPC code utilises branch operations!
My compiler options are:
-target arm-unknown-linux-gnueabihf -mcpu=cortex-a9 -O2
and:
-target powerpc-unknown-linux-gnu -mcpu=603e -O2
It seems that the __builtin_addc* functions don't really have an advantage; is
that true?
Current Codegen: https://godbolt.org/z/cVSccY
rG257acbf6aee9 solves much of the poor codegen generically in DAGCombine
x86/arm code works as expected
ppc struggles as the target still needs to be updated to use ADDCARRY/SUBCARRY
Andy - maybe open a PPC specific bug?
Adding some PPC guys who might know whats best to do to improve PPC support and reduce the branching.
Do we need to extend the ISD::ADD/SUBCARRY combines (where possible) to better support ISD::ADD/SUBE or can we move PPC away from ISD::ADD/SUBC + ISD::ADD/SUBE?
We aren't currently working on moving away from the old carry setting/using nodes to the new ones, but we do not have any fundamental issue with doing so. I am not sure this statement is any help though.