AnyDSL / thorin

The Higher-Order Intermediate Representation
https://anydsl.github.io
GNU Lesser General Public License v3.0
151 stars 15 forks source link

LEA for vectors #18

Closed madmann91 closed 9 years ago

madmann91 commented 9 years ago

This seems to correct at least two bugs linked to a lack of support for vector types in LEA nodes.

leissa commented 9 years ago

Getting the address of an element of a SIMD vector is IMHO broken.

Say, you have v of type <i64 x 4>. Now, you want to have the address of the second element. This assumes that the vector is laid out as follows: v0 | v1 | v2 | v3 However, maybe your ISA does not support vectors of i64s. So, the compiler implements it with two <i32 x 4>s. The first is the lower half, the second, the upper half. And now? It just doesn't make sense. For the same reason Code like this is broken in C:

int32_t i = /*...*/;
// get upper half of i
int16_t* = ((int16_t*) &i)+1; // broken: undefined behavior
leissa commented 9 years ago

that said, if this fixes some hot issue for you, I'm fine of merging this. But we should file a bug for this and come up with a clean solution.

madmann91 commented 9 years ago

Well, I have to look into this, then. Unless there is something I have overlooked, I don't think this happens because I take the address of a vector component : it seems to happen after some program transformations. In any case, there should be an explicit message to prevent this from happening in the front-end.

madmann91 commented 9 years ago

The piece of code that produces the bug is :

extern "C" {
    fn output(int, int, int, int) -> ();
}

fn mask_store(i: simd[bool * 4], src: simd[int * 4], mut dst: &simd[int * 4]) -> () {
    if i(0) { dst(0) = src(0) }
    if i(1) { dst(1) = src(1) }
    if i(2) { dst(2) = src(2) }
    if i(3) { dst(3) = src(3) }
}

fn main() -> () {
    let mask = [true, false, true, false];
    let mut v = [1, 2, 3, 4];
    let w = [4, 5, 6, 7];
    mask_store(mask, w, &v);
    // To prevent any optimization
    output(w(0), w(1), w(2), w(3))
}

As you can see, I do not take the address of a vector component. The MapExpr "dst(0)" is actually converted into "(*dst)(0)" and during codegen, there is a special case when the left operand is a pointer that generates a LEA.

madmann91 commented 9 years ago

If I manage to never use any pointer to vector in my code, I still get a segfault in the unreachable code elimination pass for the same reason (LEA nodes generated during vrebuild).

leissa commented 9 years ago

ok. I'll look into it.

leissa commented 9 years ago

I have to adjust the magic in thorin/irbuilder.* to not emit LEAs in the case of SIMD stuff. I hope I can fix that over the weekend.

leissa commented 9 years ago

There are still some things in the semantic analysis broken. Casts from [4 x int] to simd[4 x int] are allowed which should be prohibited.

leissa commented 9 years ago

Ok, this is an impala bug. I report a new issue.

madmann91 commented 9 years ago

This bug fix has a huge performance impact for vectors of size 8 (compared to the solution I proposed). With your fix, the code gets 3-4 times slower. This doesn't seem to affect vectors of size 4. By looking at the assembly, it generates a lot more vinsertps instructions and spills.

madmann91 commented 9 years ago

This problem still exists, but when force partial evaluation of the part that writes to the vector components, those insert are no longer emitted. Hence, I close this again.

leissa commented 9 years ago

Appearantly, LLVM is not smart enough to optimize those insert/extract sequences into efficient code again. If you still encounter performance issues somewhere, I'd suggest to include your patch as workaround.