Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

vectors of i1 and vectors x86 long double don't work #1870

Open Quuxplusone opened 16 years ago

Quuxplusone commented 16 years ago
Bugzilla Link PR1784
Status REOPENED
Importance P normal
Reported by Duncan Sands (baldrick@free.fr)
Reported on 2007-11-09 02:29:09 -0800
Last modified on 2021-09-12 19:47:18 -0700
Version trunk
Hardware Other Linux
CC anton@korobeynikov.info, cameron@cs.sfu.ca, emvv@mail.ru, hfinkel@anl.gov, leissa@cs.uni-saarland.de, lennart@augustsson.net, llvm-bugs@lists.llvm.org, llvm-dev@redking.me.uk, llvm@sunfishcode.online, meadori@gmail.com, michael.hliao@gmail.com, nadav.rotem@me.com, nobled@dreamwidth.org, paulsson@linux.vnet.ibm.com, preston.gurd@intel.com, stpworld@narod.ru, tpr.ll@botech.co.uk, yilong.guo@intel.com
Fixed by commit(s)
Attachments sizes.patch (3778 bytes, text/plain)
Blocks PR31265, PR3037, PR3352, PR7303
Blocked by
See also PR34405
I noticed that a lot of the vector code assumes that there is
no padding between vector elements.  For example, the size is
assumed to be the primitive size of the element times the number
of elements.  However for x86 long double the primitive size is
80 bits while elements will be spaced by 96/128 bits (depending
on the os) in order to maintain alignment.  There's a similar
problem for vectors of apints.
Quuxplusone commented 15 years ago

Does gcc support vector of long double? Maybe it would make sense to just reject all vectors with elements that would require intra-element padding?

Quuxplusone commented 15 years ago
Yes, it looks like gcc supports vectors of long double.  I compiled
this with it on x86-32 and it compiles (didn't check whether the
assembler is sensible, or how it lays out the vector).

typedef long double dvec __attribute__ ((__vector_size__ (24)));
void d(dvec *x);
int main() {
  dvec x, y, z;
  d(&x);
  d(&y);
  z = x + y;
  d(&z);
  return 0;
}

Also, we may want to support vectors of boolean as results
of vector comparisons.
Quuxplusone commented 15 years ago
To be more explicit about "There's a similar problem for vectors of apints",
consider a vector <n x i1>.  getPrimitiveSizeInBits reckons this has size
n bits, i.e. that it has been bitpacked.  Yet you can do GEP on them, and
I'm willing to bet that the GEP doesn't address individual bits!  Likewise,
vast parts of codegen assume that vectors are laid out like arrays, which
in this case means that each i1 is in its own byte (also incompatible with
what getPrimitiveSizeInBits returns).

Possible solutions:
(1) bitpack vectors.  This could be done but seems like a lot of work
for not much benefit.  Also, I think some targets support vector compare
results producing a "vector of booleans".  It would be nice if this mapped
in a convenient way to a vector of i1.
(2) lay vectors out like arrays.  So elements of a vector of long double
would be spaced by 12/16 bytes depending on the target.  Vectors of i1
would typically get 1 byte per i1.
(3) byte pack vectors.  Here vectors of long double would have elements
spaced by 10 bytes (unaligned!).  Vectors of i1 would have elements 1
byte apart.
My preference is for (2).  However if GEP is disallowed for vectors,
then other schemes become more feasible.

I think Chris needs to make a policy decision here.
Quuxplusone commented 15 years ago
PS: While LegalizeDAG will happily "codegen" vectors of i1
and long double, the result is often bogus.  This is why
I put asserts in LegalizeTypes in various spots where the
logic is wrong for such vectors
Quuxplusone commented 15 years ago
Sorry, should have been getBitWidth rather than getPrimitiveSizeInBits:
it is VectorType::getBitWidth that returns the number of elements * the
element size in bits.  Note that codegen does the same for MVT::getSizeInBits
for a vector.  The result is that vectors of i1 are considered to be
bitpacked...
Quuxplusone commented 15 years ago

_Bug 3434 has been marked as a duplicate of this bug._

Quuxplusone commented 14 years ago

_Bug 6204 has been marked as a duplicate of this bug._

Quuxplusone commented 14 years ago

Not to mention vectors of i24, and other types that are not naturally aligned.

Quuxplusone commented 14 years ago

Attached sizes.patch (3778 bytes, text/plain): proposed patch

Quuxplusone commented 14 years ago

I committed the patch, plus several fixes to CodeGen to fix lowering of vectors with element padding, in r97064.

Quuxplusone commented 14 years ago

There are still several issues, see this email:

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20100222/096825.html

Quuxplusone commented 14 years ago

The decision is: vectors will be bit-packed. GEP on vectors will be disallowed.

Quuxplusone commented 14 years ago

And for the record, r97064 was reverted, following the discussion in the email thread linked above.

Quuxplusone commented 13 years ago
I now think that vectors should be stored in memory in a target dependent
manner,
which may or may not be the same as how arrays are stored in memory.  This means
fixing all the places that think you can get the size of an array without target
data, not to mention users of
  VectorType::getInteger
  VectorType::getExtendedElementVectorType
  VectorType::getTruncatedElementVectorType
  VectorType::getBitWidth
since these methods are all essentially assuming that vectors are bitpacked.

An additional issue is that bitcast of vectors becomes problematic and should
probably be disallowed.  The issue is that bitcast is only legal if the source
and destination types have the same bitwidth.  But the bitwidth of a vector will
now depend on the target, so the legality of the IR will depend on the target,
which is not currently the case.
Quuxplusone commented 13 years ago
I think it's a mistake to make bitcast be the operation which determines how
everything else must work.

I propose the first step to solving this problem is to redefine the bitcast
operator.  Instead of saying it is equivalent to a store and load in the new
type, I propose it be defined as:

 - A bitcast operator reproduces the bits of its operand in precedence order.

 - For the purposes of bitcasting, the precedence of a bit in a vector value is the
    precedence index of the bit in its element plus the element index of its
    vector element times the precision of the element type.

This is an intuitive definition, and it makes bitcast a fully target-
independent operation that doesn't care about endianness.  Bitcasting an
integer to a vector would always put the most significant bit in the same
place, which is not true under the store+load definition.

On little-endian targets, there's effectively no change.  On big-endian
targets, this would change the byte order of bitcasts between vectors and non-
vectors and between vectors with elements of different sizes.  If front-ends
desire the old behavior, they can emit an explicit llvm.bswap operations.

If we accept this definition, then we can move on to consider what makes the
most sense for vectors without being artificially constrained.
Quuxplusone commented 13 years ago
Hi Dan, your suggestion seems fairly reasonable to me.  While there, why not
extend bitcast to first class structs and arrays too?
Quuxplusone commented 13 years ago
(In reply to comment #16)
> While there, why not extend bitcast to first class structs and arrays too?

There's no reason why it would be easier now than later, it would take work to
implement, and it's not needed to fix the main bug here.  And because no one
should need it, imho, but we can discuss that elsewhere.
Quuxplusone commented 13 years ago
The steps needed to implement this are:
  - teach optimizer passes which convert store+load to bitcast to check endianness and either be
     conservative or insert bswap calls
  - teach front-ends to emit llvm.bswap calls around bitcasts on big-endian targets when appropriate
  - teach codegen to convert bswap calls around such bitcasts into no-ops.

The hardest part will be doing the codegen part for PowerPC.
Quuxplusone commented 13 years ago
There's another possibility for big-endian machines: on such a machine, when
storing a vector < 0, 1, 2, 3> store the components in reverse order: i.e. 3
first, 2 at next memory location etc.  Who says that the lowest numbered index
has to be stored first?  Anyway, the result is that bitcast is then still the
same as "store as one type, load out as another" on big-endian machines too.
More generally, when a vector operation refers to element i, turn that into the
processor operation on element num_elements - i.
Quuxplusone commented 12 years ago

_Bug 11460 has been marked as a duplicate of this bug._

Quuxplusone commented 12 years ago

_Bug 12650 has been marked as a duplicate of this bug._

Quuxplusone commented 11 years ago

_Bug 15131 has been marked as a duplicate of this bug._

Quuxplusone commented 11 years ago

Shall we make this bug a parent bug for more concrete bugs due to the same issue but in different components. The issue raised in this bug is quite high level and covers much more code/design. Some bugs is root-caused in the same but the fix would be quite different and probably not in relevant components. Making this a parent bug will track all related/children bugs more easily.

Quuxplusone commented 11 years ago

Making this a parent bug makes sense to me.

Quuxplusone commented 8 years ago
A strong use case for the bitpacked approach is Parabix software.   Parabix
is a framework for high-performance text processing using the concept of
parallel bit streams.   In Parabix, bytestreams <N x i8> are transformed to
parallel arrays of bit streams <N x i1>.   The streams are then processed 256
elements at a time using the 256-bit registers of AVX2, for example.

One significant application of Parabix is the icgrep search application,
which generally achieves GB/s regular expression search even with complex
regular expressions.

We have workarounds, but bitpacked vectors of i1 would be ideal.
Quuxplusone commented 6 years ago
I have just come across this because I am using the SPIR-V frontend
https://github.com/KhronosGroup/SPIRV-LLVM
and I have a case where it generates a GEP into a vector of i1 in memory.

A lot of code assumes that vector of i1 is packed (as already noted here), but
at least GVN and alias analysis cannot cope with the GEP because it is not to a
byte boundary.

Given that this issue has been hanging around for 10 years and does not seem to
have a resolution, should LLVM stop claiming to fully support vector of i1?
I.e. should the spec and the verify pass say that vector of i1 is illegal, or
say that vector of i1 is legal but you can't GEP into it, or something?
Quuxplusone commented 6 years ago

Work is still progressing, although at a glacial pace - Jonas Paulsson has done some work recently for instance.

Quuxplusone commented 6 years ago

Yes, I committed the handling for stores of vectors with non byte-sized elements, see https://reviews.llvm.org/D42100.

I think Uli and Eli had an interesting discussion there on Jan 30, taking up this very topic that I think you are concerned about.

To me, it seems that although (I think) there is no formal statement anywhere, their discussion seems to indicate the path to follow. I am thinking about exactly how a vector should be stored in memory on big/little endian machines, including non-byte sized elements.

Perhaps the optimization passes you mention can be extended, or at least learn to reject cases they can not handle?

Quuxplusone commented 6 years ago

I have been thinking about my GEP issue some more.

On my target (which has typical byte-addressable memory and 64 bit pointers), what does a GEP into a vector of i1 mean? If you allow it, it surely means that any pointer to i1 might (or might not) be the result of a GEP into vector of i1, and therefore needs to keep a bit offset as well as the byte address. My target certainly does not support that, and more generally I think LLVM assumes that a pointer to anything (with a particular address space) is always the same size, whatever type it points to.

So my frontend should not be generating GEP into vector of i1, because it is not representable on the target that it is targeting.

I guess that in itself does not make a GEP into vector of i1 illegal IR, as someone may want to implement a target where it does make sense. But it falls into a class of things that you can only use in IR if it makes sense on your target.

If someone did want to implement a target where it makes sense, that probably means bit addressable memory, which means that the assumptions in LLVM that a store size is an integral number of bytes would need to be fixed. But I guess this is unlikely to be needed.

Quuxplusone commented 6 years ago

here my thoughts, GEP calculates the pointer address based on the data layout of the specified type. vector type is explicitly specified in the data layout of target and should not be assumed to be derived from corresponding scalar type, e.g. <8 x i1> from i1. The other point is not state very clearly in LLVM lang ref is that alignment (specified in bits) is always specified in unit of BYTE. That constrain is only stated for the natural stack alignment. But, we should assume that all alignment. In other word, GEP only calculates the pointer address in unit of BYTE.

Back to your specific issue, I agreed with that your frontend should not generated GEP into vector of i1. Instead, it should generate that in 2 steps of extraction of the corresponding bit following the vector load.