Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Loop with "# pragma clang loop interleave (enable)" is not interleaved #46636

Open Quuxplusone opened 4 years ago

Quuxplusone commented 4 years ago
Bugzilla Link PR47667
Status NEW
Importance P enhancement
Reported by Yoshinobu Oono (fj8765ah@aa.jp.fujitsu.com)
Reported on 2020-09-28 02:23:54 -0700
Last modified on 2021-03-30 22:51:33 -0700
Version trunk
Hardware PC Linux
CC david.green@arm.com, hfinkel@anl.gov, llvm-bugs@lists.llvm.org, llvm@meinersbur.de, t-kawashima@fujitsu.com, utsumi.yuichiro@fujitsu.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also
If you write "# pragma loop interleave (enable)" and "# pragma loop vectorize
(enable)" on the loop, the behavior is the same.
The reason is that they both create metadata like this:.

!8 = !{!"llvm.loop.vectorize.enable", i1 true}

Users will write these pragmas if they want to vectorize or interleave.
However, if "# pragma clang loop interleave (enable)" is specified, it is not
interleaved.
I think it should be interleaved. What do you think?

I think we should do one of the following two things.
* When the metadata described above exists, apply both vectorization and
interleaving.
* Create and control separate metadata for each.

It seems that interleaving is not controlled by the pragma but controlled by
the -funroll-loops option.

By the way, "# pragma clang loop interleave (disable)" is controlled by the
following metadata, so there is no problem.

!25 = !{!"llvm.loop.interleave.count", i32 1}

- interleave.c :
void foo(double * restrict a,
         double * restrict b,
         double * restrict c,
         int n) {

#pragma clang loop interleave(enable)
  for (int i=0;i<n;i++)
    c[i] = a[i] + b[i];

  return;
}

$ clang -O1 interleave.c -Rpass=loop-vector -S -Rpass-analysis=loop-vector
interleave.c:7:3: remark: the cost-model indicates that interleaving is
beneficial but is explicitly disabled or interleave count is set to 1
      [-Rpass-analysis=loop-vectorize]
  for (int i=0;i<n;i++)
  ^
interleave.c:7:3: remark: vectorized loop (vectorization width: 2, interleaved
count: 1) [-Rpass=loop-vectorize]
$ clang -O1 interleave.c -Rpass=loop-vector -S -funroll-loops
interleave.c:7:3: remark: vectorized loop (vectorization width: 2, interleaved
count: 2) [-Rpass=loop-vectorize]
  for (int i=0;i<n;i++)
  ^

- vectorize.c :
void foo(double * restrict a,
         double * restrict b,
         double * restrict c,
         int n) {

#pragma clang loop vectorize(enable)
  for (int i=0;i<n;i++)
    c[i] = a[i] + b[i];

  return;
}

$ clang -O1 vectorize.c -Rpass=loop-vector -S -Rpass-analysis=loop-vector
vectorize.c:7:3: remark: the cost-model indicates that interleaving is
beneficial but is explicitly disabled or interleave count is set to 1
      [-Rpass-analysis=loop-vectorize]
  for (int i=0;i<n;i++)
  ^
vectorize.c:7:3: remark: vectorized loop (vectorization width: 2, interleaved
count: 1) [-Rpass=loop-vectorize]
$ clang -O1 vectorize.c -Rpass=loop-vector -S -funroll-loops
vectorize.c:7:3: remark: vectorized loop (vectorization width: 2, interleaved
count: 2) [-Rpass=loop-vectorize]
  for (int i=0;i<n;i++)
  ^
Quuxplusone commented 4 years ago

Unfortunately there is not distinct metadata for interleaving and vectorization. The separe pragmas only make it appear as if these are separate options.

llvm.loop.vectorize.enable forces only one of the transformations by the LoopVectorize pass: either interleave of vectorize (it might also only interpret is as only forcing vectorization with a width >=2, not sure about the details). In any case case, since you did not specify the interleaving factor, the cost model can chose an interleave factor of 1 when vectorizing.

Quuxplusone commented 4 years ago
Hi Michael,

I am sorry for my late reply.
Thank you for an answer.
Let me confirm one thing.
When should I use "# pragma clang loop interleave (enable)"?
I couldn't think of how to use it with the current behavior.
Quuxplusone commented 4 years ago
IMHO there are two use cases for interleaving (or optimizations in general):

1. One wants a specific interleave count and specify it using #pragma clang
loop interleave_count(<n>)
2. One does not know which interleave count and relies on compiler heuristics.
Do not pass a program in this case, the compiler will chose itself, including
an interleave count of one (i.e. no interleaving).

That is, I don't see a use case for a standalone #pragma clang loop
interleave(enable), even if it was representable with the current loop metadata.
Quuxplusone commented 4 years ago
Hi Michael,

Thank you for your answer.
Your answer was the same as what I think.

I think the interleaving behavior should change from loop to loop.
I hope that "# pragma clang loop interleave (enable)" will become a meaningful
pragma in the future.