Closed madmann91 closed 9 years ago
Getting the address of an element of a SIMD vector is IMHO broken.
Say, you have v
of type <i64 x 4>
. Now, you want to have the address of the second element. This assumes that the vector is laid out as follows:
v0 | v1 | v2 | v3
However, maybe your ISA does not support vectors of i64
s. So, the compiler implements it with two <i32 x 4>
s. The first is the lower half, the second, the upper half. And now?
It just doesn't make sense. For the same reason Code like this is broken in C:
int32_t i = /*...*/;
// get upper half of i
int16_t* = ((int16_t*) &i)+1; // broken: undefined behavior
that said, if this fixes some hot issue for you, I'm fine of merging this. But we should file a bug for this and come up with a clean solution.
Well, I have to look into this, then. Unless there is something I have overlooked, I don't think this happens because I take the address of a vector component : it seems to happen after some program transformations. In any case, there should be an explicit message to prevent this from happening in the front-end.
The piece of code that produces the bug is :
extern "C" {
fn output(int, int, int, int) -> ();
}
fn mask_store(i: simd[bool * 4], src: simd[int * 4], mut dst: &simd[int * 4]) -> () {
if i(0) { dst(0) = src(0) }
if i(1) { dst(1) = src(1) }
if i(2) { dst(2) = src(2) }
if i(3) { dst(3) = src(3) }
}
fn main() -> () {
let mask = [true, false, true, false];
let mut v = [1, 2, 3, 4];
let w = [4, 5, 6, 7];
mask_store(mask, w, &v);
// To prevent any optimization
output(w(0), w(1), w(2), w(3))
}
As you can see, I do not take the address of a vector component. The MapExpr "dst(0)" is actually converted into "(*dst)(0)" and during codegen, there is a special case when the left operand is a pointer that generates a LEA.
If I manage to never use any pointer to vector in my code, I still get a segfault in the unreachable code elimination pass for the same reason (LEA nodes generated during vrebuild).
ok. I'll look into it.
I have to adjust the magic in thorin/irbuilder.* to not emit LEAs in the case of SIMD stuff. I hope I can fix that over the weekend.
There are still some things in the semantic analysis broken. Casts from [4 x int]
to simd[4 x int]
are allowed which should be prohibited.
Ok, this is an impala bug. I report a new issue.
This bug fix has a huge performance impact for vectors of size 8 (compared to the solution I proposed). With your fix, the code gets 3-4 times slower. This doesn't seem to affect vectors of size 4. By looking at the assembly, it generates a lot more vinsertps instructions and spills.
This problem still exists, but when force partial evaluation of the part that writes to the vector components, those insert are no longer emitted. Hence, I close this again.
Appearantly, LLVM is not smart enough to optimize those insert/extract sequences into efficient code again. If you still encounter performance issues somewhere, I'd suggest to include your patch as workaround.
This seems to correct at least two bugs linked to a lack of support for vector types in LEA nodes.