Open Quuxplusone opened 9 years ago
1.cc
Created attachment 14328 testcase 1.cc For testcase 1.cc: ~/workarea/llvm-r236608/build/bin/clang++ -O2 -S 1.cc The kernel loop in assembly: .LBB0_9: # %for.body3 # Parent Loop BB0_3 Depth=1 # => This Inner Loop Header: Depth=2 decl %edi cmpb $0, (%rsi) jne .LBB0_11 # BB#10: # %if.then # in Loop: Header=BB0_9 Depth=2 movl $0, (%rdx) movw $0, 4(%rdx) .LBB0_11: # %for.inc # in Loop: Header=BB0_9 Depth=2 leaq 6(%rdx), %rbx decl %edi # could have been merged with decl %edi above cmpb $0, 1(%rsi) jne .LBB0_13 # BB#12: # %if.then.1 # in Loop: Header=BB0_9 Depth=2 movl $0, 6(%rdx) movw $0, 10(%rdx) .LBB0_13: # %for.inc.1 # in Loop: Header=BB0_9 Depth=2 addq $6, %rbx addq $2, %rsi testl %edi, %edi movq %rbx, %rdx jne .LBB0_9 We can see that the decl %edi in LBB0_9 and LBB0_11 could have been merged. The latter decl %edi is generated in LoopUnroll pass. We need another InstCombine pass to cleanup some redundency generated in LoopUnroll pass. With another InstCombine pass added after LoopUnroll, we get: .LBB0_10: # %for.body3 # Parent Loop BB0_3 Depth=1 # => This Inner Loop Header: Depth=2 cmpb $0, (%rbx) jne .LBB0_12 # BB#11: # %if.then # in Loop: Header=BB0_10 Depth=2 movl $0, -10(%rdx) movw $0, -6(%rdx) .LBB0_12: # %for.inc # in Loop: Header=BB0_10 Depth=2 addl $-2, %edi # redundency removed. cmpb $0, 1(%rbx) jne .LBB0_14 # BB#13: # %if.then.1 # in Loop: Header=BB0_10 Depth=2 movl $0, -4(%rdx) movw $0, (%rdx) .LBB0_14: # %for.inc.1 # in Loop: Header=BB0_10 Depth=2 addq $12, %rdx addq $2, %rbx testl %edi, %edi jne .LBB0_10
Attached 1.cc (171 bytes, text/x-c++src): testcase 1.cc
1.cc
(171 bytes, text/x-c++src)