cdl-saarland / rv

RV: A Unified Region Vectorizer for LLVM
Other
106 stars 16 forks source link

implement vectorization for coherent conditions #36

Closed ghost closed 5 years ago

ghost commented 5 years ago

Pull request as we discussed during EuroLLVM, more detail of the approach could be found https://github.com/HuihuiSun/paper/blob/master/WCCV_ICS2019preprint.pdf

I am willing to clarify unclear parts.

ghost commented 5 years ago

This check is always false. The defined shape of i1 values is either uniform or varying.

May i suggest you keep the isAffine code and instead do the following:

  1. remove the switch case for calls and the PACXX intrinsic list (lines 365 to 390).
  2. 'return true' in line 345, if the instruction has a strided shape (eg the check you inserted in this comment's line 419).

I make the shape check for every operand of the instruction, I believe this is what you meant. If we put it in line 345, then it checks the condition instruction again, which we want to avoid.

ghost commented 5 years ago

Done with factoring out legality and heuristic parts. :)

simoll commented 5 years ago

Done with factoring out legality and heuristic parts. :)

Great :) Please have a look at the remaining test failures (tests 20,28,61). When those are fixed this can go in.

ghost commented 5 years ago

Done with factoring out legality and heuristic parts. :)

Great :) Please have a look at the remaining test failures (tests 20,28,61). When those are fixed this can go in.

Sorry for slow response. Just returned from travel. I am wondering if you could lend a helping hand on this.

  1. For 20, I think your suggestion of only applying CoherentIF to branch with a single control-flow edge could solve this. Ich mache das. :)
  2. For 28, the LLVM-IR after CoherentIF is correct, while things goes wrong with @llvm.experimental.vector.reduce.and.i1.v8i1, for which the generated assemble is not correct.
  3. For 61, it works correct for me. Could you send me the stack trace?

PACXX only processes structured control flow and uses the old ptest instruction for rv_any/rv_all instrinsics. Would appreciate your further direction pointers for me to solve these.

simoll commented 5 years ago

Done with factoring out legality and heuristic parts. :)

Great :) Please have a look at the remaining test failures (tests 20,28,61). When those are fixed this can go in.

Sorry for slow response. Just returned from travel. I am wondering if you could lend a helping hand on this.

1. For 20, I think your suggestion of only applying CoherentIF to branch with a single control-flow edge could solve this. `Ich mache das.` :)

Good.

2. For 28, the LLVM-IR after `CoherentIF` is correct, while things goes wrong with `@llvm.experimental.vector.reduce.and.i1.v8i1`, for which the generated assemble is not correct.

When the coherent condition check fails the store is not predicated as it should be (compare the code with and without CIF.. there is an any guard for the store in the normal version but not in the CIF one. Even with the CIF all test you still need to test for any before the store can execute).

3. For 61, it works correct for me. Could you send me the stack trace?

You need to build RV with compiler-rt support (cmake -DRV_ENABLE_CR=on) to trigger the bug in test 61. RV will then inline the complex math function before the vectorizer pipeline runs on it. I suspect fixing test 28 will solve this one too.

PACXX only processes structured control flow and uses the old ptest instruction for rv_any/rv_all instrinsics. Would appreciate your further direction pointers for me to solve these.

Have a look at RV's src/transform/redTools.cpp. It is perfectly fine to use LLVM reduction intrinsics.. if LLVM generates incorrect code for them then LLVM is where the bug should be fixed (and reported through https://bugs.llvm.org/).

ghost commented 5 years ago

For the new commits, the following modifications were made.

  1. Add only allowing branches with one exit in LegalityCheck to exclude IF-statements with gotos for branch optimization( both BOSCC and CIF).
  2. Correct a mistake in UndeadMaskAnalysis. For those conditional uniqueBranch, oSuccIdx = opIdx + 1;(not - here). With this correction, SuccIdx could be 2, to make the negation part of implyfunction work. Consequently, the anyGuard is also generated when all check fails.

I run the test, all passed, except for test 78, which also fails without CIF.

simoll commented 5 years ago

I did some cleanup of the code. This PR was rebased, fixed and merged in e77b2037877263a0f97b9817d523fbdbca710843 . Thanks!