Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

"Cannot select" for bitcasts of AVX data types #10386

Closed Quuxplusone closed 13 years ago

Quuxplusone commented 13 years ago
Bugzilla Link PR10073
Status RESOLVED DUPLICATE of bug 2314
Importance P normal
Reported by Ralf Karrenberg (karrenberg@cs.uni-saarland.de)
Reported on 2011-06-03 04:23:16 -0700
Last modified on 2011-06-03 13:21:19 -0700
Version trunk
Hardware PC All
CC llvm-bugs@lists.llvm.org, nadav.rotem@me.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also
The AVX backend gets confused about mask code as e.g. produced by VCMPPS
together with mask operations and corresponding bitcasts.

Masks that are represented as <8 x i32> should be able to be modified by
xor/and/or which should get lowered to VXORPS/VANDPS/VORPS.
It could also make sense to allow these to operate on <8 x float>, matching the
C intrinsics of immintrin.h (_mm256_cmpgt_ps etc. produce __m256 instead of
__m256i, _mm256_xor_ps takes __m256 operators as well) and LLVM's own
intrinsics (llvm.x86.avx.cmp.ps.256 produces <8 x float>,
llvm.x86.avx.blendv.ps.256 takes an <8 x float> operand as condition).

Currently, code generation for most of these operations fails with "Cannot
select" messages for a cast operation, which could mean that LLVM is only
confused about the required types, not about the bit operations.

Consider these examples:

define <8 x float> @test1(<8 x float> %a, <8 x float> %b, <8 x i32> %m)
nounwind readnone {
entry:
   %cmp = tail call <8 x float> @llvm.x86.avx.cmp.ps.256(<8 x float> %a,
<8 x float> %b, i8 1) nounwind readnone
   %res = tail call <8 x float> @llvm.x86.avx.blendv.ps.256(<8 x float>
%a, <8 x float> %b, <8 x float> %cmp) nounwind readnone
   ret <8 x float> %res
}

This works fine and "llc -filetype=asm -mattr=avx" produces the expected
assembly (VCMPLTPS + VBLENDVPS).

On the other hand, this does not work:

define <8 x float> @test2(<8 x float> %a, <8 x float> %b, <8 x i32> %m)
nounwind readnone {
entry:
   %cmp = tail call <8 x float> @llvm.x86.avx.cmp.ps.256(<8 x float> %a,
<8 x float> %b, i8 1) nounwind readnone
   %cast = bitcast <8 x float> %cmp to <8 x i32>
   %mask = and <8 x i32> %cast, %m
   %blend_cond = bitcast <8 x i32> %mask to <8 x float>
   %res = tail call <8 x float> @llvm.x86.avx.blendv.ps.256(<8 x float>
%a, <8 x float> %b, <8 x float> %blend_cond) nounwind readnone
   ret <8 x float> %res
}

This should produce VCMPLTPS, VANDPS, BLENDVPS, while llc (2.9 final as well as
latest trunk) bails out with:

LLVM ERROR: Cannot select: 0x2510540: v8f32 = bitcast 0x2532270 [ID=16]
   0x2532270: v4i64 = and 0x2532070, 0x2532170 [ID=15]
     0x2532070: v4i64 = bitcast 0x2510740 [ID=14]
       0x2510740: v8f32 = llvm.x86.avx.cmp.ps.256 0x2510640, 0x2511340,
0x2510f40, 0x2511140 [ORD=3] [ID=12]
...

The same counts for or and xor.
However, one specific example works:

define <8 x float> @test3(<8 x float> %a, <8 x float> %b, <8 x i32> %m)
nounwind readnone {
entry:
   %cond = xor <8 x i32> %m, %m
   %res = tail call <8 x float> @llvm.x86.avx.blendv.ps.256(<8 x float>
%a, <8 x float> %b, <8 x float> %cond) nounwind readnone
   ret <8 x float> %res
}

This produces the expected (VXORPS + BLENDVPS), but the same fails for and/or.
In this case, no casting is required, which indicates that this is the actual
problem, not the instruction selection of the xor.

Apparently, LLVM is generally unable to handle bitcasts between <8 x i32> and
<8 x float> (and <4 x i64> vs. <4 x double>), which should always be allowed
for AVX as nops.
Quuxplusone commented 13 years ago
Ralf, thanks for the detailed report. I am trying to gather the vector-select
bugs into a single bug reports. I am marking this one as a duplicate of 2314. I
started committing the vector-select patch (which required a type-legalization
refactoring and other changes all over the codegen). I plan to submit more
patches next week.
Cheers, Nadav

_This bug has been marked as a duplicate of bug 2314_