Creating vector from casted operations is not vectorized

Quuxplusone commented 12 years ago


Bugzilla Link	PR13837
Status	RESOLVED FIXED
Importance	P enhancement
Reported by	Weiming Zhao (weimingz@codeaurora.org)
Reported on	2012-09-13 13:17:55 -0700
Last modified on	2018-01-23 14:06:28 -0800
Version	trunk
Hardware	PC Windows NT
CC	llvm-bugs@lists.llvm.org, spatel+llvm@rotateright.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also	PR35732

When a vector is created by casting each element from another vector, it fails
to generate vectorized cast. For example:

int4 conv4i(float4 in)
{
  int4 out={ (int)in.x, (int)in.y, (int)in.z, (int)in.w};
  return out;
}
where int4 and float4 are vector data type.

LLVM generates scalar casts. For example, on ARM, the code look like:
conv4i:
    vmov    d1, r2, r3
    vmov    d0, r0, r1
    vcvt.s32.f32    s7, s3
    vcvt.s32.f32    s6, s2
    vcvt.s32.f32    s5, s1
    vcvt.s32.f32    s4, s0
    vmov    r0, r1, d2
    vmov    r2, r3, d3
    bx  lr
instead of:
conv4i:
    vmov    d17, r2, r3
    vmov    d16, r0, r1
    vcvt.s32.f32    q8, q8
    vmov    r0, r1, d16
    vmov    r2, r3, d17
    bx  lr

The reason is that, in DAGCombine, when visit BUILD_VECTOR, it fails to check
if all its elements come from the same cast. If it is, then it should do
BUILD_VECOTR first, followed by casting.

Take the above C code for example:
before DAGCombine, the DAG look like
                    <float x 4>
      /             |        |          \
 extract_0    extract_1  extract_2  extract_3
     |              |        |          |
   fp2int          fp2int  fp2int      fp2int
     \              |        |          /
                 BUILD_VECTOR(int x 4)

It should be folded into:
                   <float x 4>
      /             |        |          \
 extract_0    extract_1  extract_2  extract_3
      \              |        |          /
               BUILD_VECTOR(float x 4)
                       |
                     fp2int

Later, the extraction and BUILD_VECTOR will be cancelled each other.

Quuxplusone commented 6 years ago

Can we resolve this bug? This is handled in IR by SLP vectorization now.
Without SLP, we have:

define <4 x i32> @conv4i(<4 x float> %in) {
entry:
  %0 = extractelement <4 x float> %in, i64 0
  %conv = fptosi float %0 to i32
  %vecinit = insertelement <4 x i32> undef, i32 %conv, i32 0
  %1 = extractelement <4 x float> %in, i64 1
  %conv1 = fptosi float %1 to i32
  %vecinit2 = insertelement <4 x i32> %vecinit, i32 %conv1, i32 1
  %2 = extractelement <4 x float> %in, i64 2
  %conv3 = fptosi float %2 to i32
  %vecinit4 = insertelement <4 x i32> %vecinit2, i32 %conv3, i32 2
  %3 = extractelement <4 x float> %in, i64 3
  %conv5 = fptosi float %3 to i32
  %vecinit6 = insertelement <4 x i32> %vecinit4, i32 %conv5, i32 3
  ret <4 x i32> %vecinit6
}

And after:

$ ./opt -slp-vectorizer 13837.ll -S |grep fptosi
  %0 = fptosi <4 x float> %in to <4 x i32>

Quuxplusone commented 6 years ago

The bug was reported five years ago. Since it's not an issue now, we can close it.

Quuxplusone commented 6 years ago

I added a test for this example to prevent regression:
https://reviews.llvm.org/rL323269

Feel free to adjust the target for the test if I got that wrong.

Quuxplusone / LLVMBugzillaTest

Creating vector from casted operations is not vectorized #13912