Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

SystemZ data layout shouldn't depend on target features #49355

Open Quuxplusone opened 3 years ago

Quuxplusone commented 3 years ago
Bugzilla Link PR50386
Status NEW
Importance P normal
Reported by Aaron Puchert (aaronpuchert@alice-dsl.net)
Reported on 2021-05-18 08:10:35 -0700
Last modified on 2021-05-19 06:58:15 -0700
Version trunk
Hardware PC Linux
CC llvm-bugs@lists.llvm.org, uweigand@de.ibm.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also https://bugzilla.opensuse.org/show_bug.cgi?id=1185952
The SystemZ backend uses different target layouts depending on target features,
specifically it adds v128:64 if vector extensions are used. This means that
modules with different target features are not ABI-compatible, and would for
example prevent a library using different features than an application. This is
unprecedented and unexpected.

The commit message in ce4c10958502 states that

    However, for compatibility with old code that may use vector types, when
    *not* using the vector facility, the old alignment rules (vector types
    are naturally aligned) remain in use.

but I don't think that having the data layout feature-dependent helps
compatibility either.

This issue surfaces e.g. in Postgres, where static modules are compiled with
the default feature set, but the runtime JIT uses the feature set of the host.
Subsequently JITted code and the static modules are incompatible.

In my view the "Vector ABI" should be made the default, or whether it is being
used should be made part of the target triple. In any event, the data layout
should not depend on the feature set and modules compiled with different
features should be compatible. (And run on hardware that supports all of the
features.)
Quuxplusone commented 3 years ago
(In reply to Aaron Puchert from comment #0)
> The SystemZ backend uses different target layouts depending on target
> features, specifically it adds v128:64 if vector extensions are used. This
> means that modules with different target features are not ABI-compatible,
> and would for example prevent a library using different features than an
> application. This is unprecedented and unexpected.

Well, the ABI for vector types *is* different on machines that support
the vector feature from those don't, for the simple fact that on the
former, vector types are passed in vector registers, and on the latter,
they're not (because the machine does not *have* vector registers).
This is true on many architectures, not just SystemZ.

On our platform, there is indeed a second ABI change that is more
unusual, and affects the alignment of vector types.  However, that
should in principle not cause any *additional* compatibility issues
given that code that uses vector types is incompatible between
architectures versions anyway, given the above.

> The commit message in ce4c10958502 states that
>
>     However, for compatibility with old code that may use vector types, when
>     *not* using the vector facility, the old alignment rules (vector types
>     are naturally aligned) remain in use.
>
> but I don't think that having the data layout feature-dependent helps
> compatibility either.

Having different data layout strings in LLVM is not about "helping"
cross-architecture compatibility, it is required to generate correct
code for the ABI used for the *current* target architecture. This ABI
is used by other compilers on the platform, and at this point is not
something that can be changed.

> This issue surfaces e.g. in Postgres, where static modules are compiled with
> the default feature set, but the runtime JIT uses the feature set of the
> host. Subsequently JITted code and the static modules are incompatible.
>
> In my view the "Vector ABI" should be made the default, or whether it is
> being used should be made part of the target triple. In any event, the data
> layout should not depend on the feature set and modules compiled with
> different features should be compatible. (And run on hardware that supports
> all of the features.)

If modules are compiled with different features and they actually use
vector types in ABI-relevant places (e.g. function parameters), then
the modules will in fact be incompatible *anyway*, and this holds true
on other platforms as well.

I think the real problem here is that the check for identical data layout
strings may cause false positives.  For example, if modules use vector
types only internally (not across module boundaries), or even not at all,
that check would still detect an incompatibility where actually none
exists.

We have a similar compat check in the GNU toolchain (gcc + ld), but
there we try to more precise and avoid false positives in those cases.
Maybe the LLVM check can be improved so that e.g. if the only difference
in the data layout is in a type that is not used in the module at all,
then it shouldn't count as a conflict.

For now, I'd suggest that if you have a use case that mixes JITed code
with statically compiled code, then the JIT should be set to the same
architecture as the static code.

Looking forward, that problem hopefully becomes less of an issue:
Linux distributions are starting to move to a minimum architecture
level of z13 or higher that always has the vector feature available.
There, everything will use the new vector ABI anyway.
Quuxplusone commented 3 years ago
(In reply to Ulrich Weigand from comment #1)
> Well, the ABI for vector types *is* different on machines that support
> the vector feature from those don't, for the simple fact that on the
> former, vector types are passed in vector registers, and on the latter,
> they're not (because the machine does not *have* vector registers).
> This is true on many architectures, not just SystemZ.
Which target has a feature-dependent calling convention? With x86, the 32-bit
ABI will never use xmm registers even with -msse, and the 64-bit ABI will
always use them, but never the ymm registers (even though we might have AVX).

> On our platform, there is indeed a second ABI change that is more
> unusual, and affects the alignment of vector types.  However, that
> should in principle not cause any *additional* compatibility issues
> given that code that uses vector types is incompatible between
> architectures versions anyway, given the above.
This is not about code that uses vector types, but code that doesn't. One
should be able to run such code on a machine with vector support, and link it
with other code that uses vector types.

> Having different data layout strings in LLVM is not about "helping"
> cross-architecture compatibility, it is required to generate correct
> code for the ABI used for the *current* target architecture.
Correct, it's the triple that encodes the ABI. But that's the problem: the data
layout here is not a function of the triple alone, it is also affected by
compiler flags.

The vector and non-vector ABI should get different triples.

> This ABI is used by other compilers on the platform, and at this point
> is not something that can be changed.
Which ABI? The point of ABIs is to guarantee binary compatibility, but haven't
we just established that there is no binary compatibility? How can we have
cross-compiler compatibility if not even modules produced by the same compiler
are compatible?

> If modules are compiled with different features and they actually use
> vector types in ABI-relevant places (e.g. function parameters), then
> the modules will in fact be incompatible *anyway*, and this holds true
> on other platforms as well.
Modules with the same triple must be compatible, regardless of the feature set
they were compiled with. I'm not aware of any other platform that violates
this, and it's surely not x86.

> I think the real problem here is that the check for identical data layout
> strings may cause false positives.  For example, if modules use vector
> types only internally (not across module boundaries), or even not at all,
> that check would still detect an incompatibility where actually none
> exists.
I don't think so. You'd rather want false positives here than false negatives.
Going around trying to find out if ABI differences surface in a module is
pretty hacky, and if you're linking object files (as opposed to IR) I'm not
sure you can properly check it at all.

And even if you could, what would you do then? Users rightfully assume that
they can link together modules with different feature sets, as long as their
triple matches. So just erroring out isn't an option.

I think the proper solution for calling conventions (that LLVM implements to my
knowledge) is to change the calling convention for internal (or internalized)
functions to something more efficient. Then you can put vectors in registers
for such functions without creating incompatibilities.

When it comes to data layout, you'll necessarily need different triples. Unless
you're arguing that code not using vector extensions is likely enough not using
vector data types, so you could just always go with the 8-byte alignment.

> For now, I'd suggest that if you have a use case that mixes JITed code
> with statically compiled code, then the JIT should be set to the same
> architecture as the static code.
They are the same architecture (= triple), they just use different feature sets.

I will recommend as a workaround to use the same feature set, but only on s390x
because it's needlessly pessimistic on other architectures.