Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

[SystemZ] Assertion "Invalid truncate node, src < dst!" failed #40750

Closed Quuxplusone closed 5 years ago

Quuxplusone commented 5 years ago
Bugzilla Link PR41780
Status RESOLVED FIXED
Importance P enhancement
Reported by Jonas Paulsson (paulsson@linux.vnet.ibm.com)
Reported on 2019-05-06 17:20:34 -0700
Last modified on 2019-06-18 09:34:47 -0700
Version trunk
Hardware PC Linux
CC llvm-bugs@lists.llvm.org, paulsson@linux.vnet.ibm.com, uweigand@de.ibm.com
Fixed by commit(s)
Attachments tc_invalidtrunc.ll (4782 bytes, text/plain)
Blocks
Blocked by
See also
Created attachment 21901
reduced test case

llc -mcpu=z13 -O3 -o out.s ./tc_invalidtrunc.ll

SystemZTargetLowering::combineExtract() tries to truncate from i16 to i32.

I am a bit confused about all the implications of extracting a vector element
from a BUILD_VECTOR using possibly another vector type (VecVT) (?), so I have
not attempted a patch yet.

The comment "We're extracting the low part of one operand of the
BUILD_VECTOR.": does this mean we are expecting at this point to truncate the
value always? Or could we make an extension in the integer case, per the
comment:

/// EXTRACT_VECTOR_ELT(VECTOR, IDX) - Returns a single element from VECTOR
/// identified by the (potentially variable) element number IDX.  If the
/// return type is an integer type larger than the element type of the
/// vector, the result is extended to the width of the return type. In
/// that case, the high bits are undefined.

If we could do an ANY_EXTEND, this test case would be handled here (although I
am not sure if thats beneficial or not), or if not, I guess it should just
"break" (return without doing anything).
Quuxplusone commented 5 years ago

Attached tc_invalidtrunc.ll (4782 bytes, text/plain): reduced test case

Quuxplusone commented 5 years ago
The involved DAG nodes are:

    t124: i16 = urem t122, Constant:i16<29265>
  t125: v8i16 = BUILD_VECTOR t124, t124, t124, t124, t124, t124, t124, t124
t112: i32 = extract_vector_elt t125, Constant:i32<0>

Arguments for call to combineExtract():

N      = t112
Op0    = t125
IndexN = 0
VecVT  = llvm::MVT::v8i16

return combineExtract(SDLoc(N), N->getValueType(0), VecVT, Op0,
                      IndexN->getZExtValue(), DCI, false);

In SystemZTargetLowering::combineExtract():

  // The number of bytes being extracted.
  unsigned BytesPerElement = VecVT.getVectorElementType().getStoreSize();
  ...
  } else if (Opcode == ISD::BUILD_VECTOR &&
    canTreatAsByteVector(Op.getValueType())) {
    // We can only optimize this case if the BUILD_VECTOR elements are
    // at least as wide as the extracted value.
    EVT OpVT = Op.getValueType();
    unsigned OpBytesPerElement = OpVT.getVectorElementType().getStoreSize();
    if (OpBytesPerElement < BytesPerElement)
      break;

(gdb) p BytesPerElement
$12 = 2
(gdb) p Op->dump()
t125: v8i16 = BUILD_VECTOR t124, t124, t124, t124, t124, t124, t124, t124
$20 = void
(gdb) p OpVT
$21 = {V = {SimpleTy = llvm::MVT::v8i16}, LLVMTy = 0x0}
(gdb) n
(gdb) p OpBytesPerElement
$22 = 2

The check that "the BUILD_VECTOR elements are at least as wide as the extracted
value" are done with OpBytesPerElement and BytesPerElement, but ResVT is wider:

EVT VT = MVT::getIntegerVT(ResVT.getSizeInBits());
Op = DAG.getNode(ISD::TRUNCATE, DL, VT, Op);

(gdb) p VT
$24 = {V = {SimpleTy = llvm::MVT::i32}, LLVMTy = 0x0}
(gdb) p Op->dump()
t124: i16 = urem t122, Constant:i16<29265>

It seems that a check may be missing that ResVT.getStoreSize() is not larger
than OpBytesPerElement?

Alternatively, it would be legal to extend an integer type, since in this case
the high bits are undefined per the SelectionDAG semantics, but this is
probably not useful enough to motivate handling this...
Quuxplusone commented 5 years ago
(In reply to Jonas Paulsson from comment #1)
> The involved DAG nodes are:
>
>     t124: i16 = urem t122, Constant:i16<29265>
>   t125: v8i16 = BUILD_VECTOR t124, t124, t124, t124, t124, t124, t124, t124
> t112: i32 = extract_vector_elt t125, Constant:i32<0>

Ah, this may be the problem right here: we have a vector of type v8i16, so a
single element should be of type i16, but the type of the extract_vector_elt is
i32.

Now, either this is just a bug; then this should be fixed at the place that
extract_vector_elt was originally created.

But it could also be that this is intentional (I guess the definition of
extract_vector_elt allows it), in particular because i16 isn't actually a legal
type so after legalization we probably *have* to use i32 here anyway.

In that case, the code probably should insert an ANY_EXTEND as required ...
Quuxplusone commented 5 years ago

It seems that this starts out as a v16i16 urem, which has an illegal type. The divisor is a vector constant splat, so SystemZTargetLowering::combineIntDIVREM() would have unrolled this if the type had been legal.

combineIntDIVREM() however then is called again after type legalization, at which point the vector type is legal: v8i16. So now it unrolls it, which seems like a bug, since then we get an i16 urem, which has an illegal type (after type legalization).

Suggested patch: https://reviews.llvm.org/D62036.

Not quite sure if we need to fix combineExtract() - it seems that since ResVT may legally be wider than the BUILD_VECTOR element, we should maybe also fix that to do an ANY_EXTEND. Or is that supposed to never happen?

Quuxplusone commented 5 years ago

Suggested patch for combineIntDIVREM() committed as r360965, which handles the test case.

Can this be abandoned now (or should we fix also combineExtract() to handle an extending integer extraction)?

Quuxplusone commented 5 years ago
(In reply to Jonas Paulsson from comment #4)
> Suggested patch for combineIntDIVREM() committed as r360965, which handles
> the test case.
>
> Can this be abandoned now (or should we fix also combineExtract() to handle
> an extending integer extraction)?

IMO this can be abandoned now (until we run into this again).