Closed tkoenig1 closed 3 months ago
Try compiling with the llc flag "-disable-lsr".
Try compiling with the llc flag "-disable-lsr".
I've now modified my compile script to
#! /bin/bash
a=${1%%.c}
b=${a}_opt
clang -c -O3 -emit-llvm -fno-unroll-loops $a.c
opt -disable-loop-unrolling -O3 --march=my66000 --frame-pointer=none --enable-vvm $a.bc > $b.bc
llc --disable-lsr --enable-predication --enable-predication2 --enable-carry-generation --early-carry-coalesce --enable-vvm -march=my66000 $b.bc
and still get the same issue:
LBB0_4: ; %vector.body
; =>This Inner Loop Header: Depth=1
vec r6,{}
ldub r7,[r2,r5,15]
ldub r8,[r2,r5,14]
ldub r9,[r2,r5,13]
ldub r10,[r2,r5,12]
ldub r11,[r2,r5,11]
ldub r12,[r2,r5,10]
ldub r13,[r2,r5,9]
and so on.
Update: This is not vectorized with
clang -c --target=my66000 -O3 -fno-vectorize -fno-slp-vectorize -emit-llvm -fno-unroll-loops $1
opt -disable-loop-unrolling -O3 --march=my66000 --frame-pointer=none --enable-vvm $a.bc > $b.bc
llc --disable-lsr --enable-predication --enable-predication2 --enable-carry-generation --early-carry-coalesce --enable-vvm -march=my66000 $b.bc
the generated code is
beq0 r3,.LBB0_3
; %bb.1: ; %.preheader
ldub r4,[r2]
add r4,r4,#1
stb r4,[r1]
cmp r4,r3,#1
bne r4,.LBB0_2
.LBB0_3: ; %.loopexit
ret
.LBB0_2: ; %.preheader..lr.ph_crit_edge
add r6,r2,#1
add r5,r1,#1
mov r4,#2
.LBB0_4: ; %.lr.ph
; =>This Inner Loop Header: Depth=1
ldub r6,[r6]
add r6,r6,#1
stb r6,[r5]
cmp r5,r4,r3
beq r5,.LBB0_3
; %bb.5: ; %.lr.ph..lr.ph_crit_edge
; in Loop: Header=BB0_4 Depth=1
add r6,r2,r4
add r5,r1,r4
add r4,r4,#1
br .LBB0_
I just tried the most recent compiler, and it seems to be OK now:
memc: ; @memc
; %bb.0: ; %entry
beq0 r3,.LBB0_2
; %bb.1: ; %for.body.preheader
mov r4,#0
.LBB0_3: ; %for.body
; =>This Inner Loop Header: Depth=1
vec r5,{}
ldub r6,[r2,r4,0]
add r6,r6,#1
stb r6,[r1,r4,0]
loop1 eq,r4,r3,#1
.LBB0_2: ; %for.cond.cleanup
ret
This is with
#! /bin/bash
a=${1%%.[ci]}
b=${a}_opt
clang -fverbose-asm -c --target=my66000 -O3 -fno-vectorize -fno-slp-vectorize -emit-llvm -fno-unroll-loops -fomit-frame-pointer $1
opt -disable-loop-unrolling -O3 --march=my66000 --frame-pointer=none --enable-vvm $a.bc > $b.bc
llc -O2 -enable-remove-range-check --enable-predication --enable-predication2 --enable-carry-generation --early-carry-coalesce --enable-vvm -march=my66000 $b.bc
so I guess the issue has been fixed in the meantime.
Closing.
The code
when translated with
yields an interesting translation, where the function starts with
and the loop later has
Two issues: Why is there a stack intermediate at all, and why the unrolling?
When compiling with
it does not vectorize at all.