Suboptimal codegen for Bool assignment

ancapdev commented 7 years ago

Observing what looks like suboptimal codegen when loading and assigning Bool variables. It seems to undergo some transformation where I'm guessing it loses the information that it's a Bool type and upon assignment forces its value to [0, 1] by binary and, which at least naively to me seems redundant.

Example below, tested on 0.5.1 and 0.6.0-pre.alpha.

type Foo
    flag1::Bool
    flag2::Bool
end

function update(foo::Foo)
    foo.flag2 = foo.flag1
end

code_native(update, (Foo,))
code_llvm(update, (Foo,))

LLVM output

define i8 @julia_update_60814(i8** dereferenceable(2)) #0 !dbg !5 {
top:
  %1 = bitcast i8** %0 to i8*
  %2 = load i8, i8* %1, align 16
  %3 = getelementptr i8, i8* %1, i64 1
  %4 = and i8 %2, 1
  store i8 %4, i8* %3, align 1
  ret i8 %4
}

Native x64 output

    .text
Filename: In[1]
    pushq   %rbp
    movq    %rsp, %rbp
Source line: 9
    movb    (%rdi), %al
    andb    $1, %al
    movb    %al, 1(%rdi)
    popq    %rbp
    retq
    nopl    (%rax)

Keno commented 7 years ago

Could probably be fixed by putting !range metadata on boolean loads.

StefanKarpinski commented 7 years ago

Would using i1 in our LLVM for Bool be a feasible approach?

Keno commented 7 years ago

In theory that should probably work, yes. In practice you'll probably run into LLVM bugs if you try to do, because no other frontend does it.

StefanKarpinski commented 7 years ago

IIRC that's probably why we didn't do it in the first place.

tkelman commented 7 years ago

I thought we did for a while, and changed because of said LLVM bugs

iamed2 commented 3 years ago

The claim that no other frontend does it is no longer true! :)

Clang appears to use i1: https://blog.matthieud.me/2020/exploring-clang-llvm-optimization-on-programming-horror/

Some of us were looking at that blog post it on Slack and wondering why Julia couldn't make the same optimizations. Julia currently generates code for the naive loop, and then vectorizes it during optimization.

function isEven2(num::T) where T 
   numComp = zero(T) 
   even = true 
   while num != numComp
      even = !even 
      numComp = numComp + one(T)
   end
   return even
end

I took the unoptimized code (@code_llvm optimize=false dump_module=true isEven2(Int32(300))) and popped it into Godbolt, compiling with clang -O1. This generates the naive loop in asm. But I noticed that one of the passes in the blog said "even is no longer a byte (i8) but stored directly as a single bit (i1), thus removing several conversion instructions." I tried manually removing the conversions and that resulted in the O(1) version being generated by clang!

Here's the version before and after my rewriting: https://gist.github.com/iamed2/863a8f329b006bdc200a1128dbf5bd6e

All this to say it seems like it might be a good idea to take another look at using i1s for Bools.

JuliaLang / julia

Suboptimal codegen for Bool assignment #21712