Legality checks vs vectorizer features

Hey, This is more a general question about the separation often cited between legality checks vs the features of a vectorizer (seen here, but also in Intel VPlan slides)

It is said "RV does not perform exhaustive legality checks ", but I'm failing to see how a legality check is not vectorizer dependent? For example, a vectorizer could be able to vectorize a conditional while another would not, but what would be the output of a legality check without the knowledge of way the vectorizer is working? At some point, the legality check has to generate the same kind of "execution mask" for the code to check that there are no overlapping writes or read depending on previous writes...etc. so How this separation can work in practice?

You are right that the precise notion of legality depends on the vectorizer. What is meant by separation of legality checks is that some parts of legality checking are not implemented in the vectorizer itself but offloaded to other analysis.

For example, the least common denominator among loop vectorizers is that there must not be any data races between loop iterations (within a minimal dependence distance). A third party has to establish that property, eg an loop iteration analysis or users through pragmas (pragma omp simd, ..). On the other hand, detection of recurrence/reduction patterns is usually handled by the vectorizer itself since it is very implementation specific.

In practice, regarding the example of overlapping memory accesses, i see the following approach for RV: 1) have an analysis that detects potential runtime dependences. 2) Transform the loop to make the dependences explicit: one approach is to insert vector-length agnostic intrinsics in the loop (cf vpconflict in Intels "FlexVec" PLDI'16 paper).

// scalar loop
for (int i = 0; i < n; ++i) {
  A[i] = A[f(i)];
}

// prepared loop (vector-length agnostic style)
for (int i = 0; i < n;) {
  iVec = f([i:i+n]);
  conflictIndex = rv_conflict([i:i+n], iVec)
  // the hypothetical rv_conflict returns the first index j
  // with j >= i where f(j) is in the interval [i:j].
  // Inputs are varying, return value is uniform.

  if (i < conflictIndex) {
    // this will execute without data race.
    A[i] = A[iVec];
  }
  i = conflictIndex + rv_strided(1);
  // advance to the next chunk,
  // rv_strided(1) will evaluate to [0, 1, ...].
}

3) Run the default RV pipeline on the prepared loop. Since all dependences (on memory) have been encoded/resolved explicitely by means of RV's intrinsics, it will generate correct code.

cdl-saarland / rv

Legality checks vs vectorizer features #16