Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Pick any COMDAT with different alignment causes linking issue for Windows ARM64 #40476

Open Quuxplusone opened 5 years ago

Quuxplusone commented 5 years ago
Bugzilla Link PR41506
Status NEW
Importance P normal
Reported by Tom Tan (Tom.Tan@microsoft.com)
Reported on 2019-04-15 15:01:36 -0700
Last modified on 2019-05-06 14:09:15 -0700
Version unspecified
Hardware PC Windows NT
CC efriedma@quicinc.com, llvm-bugs@lists.llvm.org, nicolasweber@gmx.de, rnk@google.com
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also

LLD supports linking COFF files compiled from both Clang/LLVM and MSVC together, both compilers could generate the same global/static data in different alignment which only works with the respective compiler.

For example, there is a static variable in 8 bytes with type struct _Mbstatet in MSVC CRT header file. Both MSVC and Clang compile this static data as COMDAT with pick any COMDAT selection. MSVC gives it 8 bytes alignment but Clang align it on 4 bytes.

Both alignments work fine within each compiler. MSVC loads this 8 bytes data via below pattern:

ADRP Xn, PageAddress ; take page address into Xn LDR Xt, [Xn, #imm] ; load 8 bytes struct data

the accessing code generated by Clang looks below:

ADRP Xn, PageAddress ; take page address into Xn ADD Xn, Xn, PageOffsetStaticVar ; add page offset to Xn LDR Xt, [Xn] ; load 8 bytes struct data

But if linking both code patterns together, LLD/link would fail to apply relocation for the former (MSVC) code snippet if COMDAT selection chooses the data with 4 bytes alignment from Clang, because #imm in 64-bit LDR instruction will be scaled by 8 bytes, so it can only load data aligned at 8 bytes. The latter code snippet is fine because relocation happens in ADD instruction instead of LDR instruction.

Because the code generated by MSVC and Clang make sense for themself, could this be fixed by changing COMDAT selection to prefer the biggest alignment for IMAGE_COMDAT_SELECT_ANY, and just for ARM64? I prototyped the change in ObjFile::handleComdatSelection and verified it works.


below is the minimum repro code:

// header.h typedef struct _Mbstatet { unsigned long _Wchar; unsigned short _Byte, _State; } _Mbstatet;

class Test { public: _Mbstatet Init() { static _Mbstatet static_stat; static _Mbstatet static_stat1;

    _Mbstatet new_stat = static_stat;
    _Mbstatet new_stat1 = static_stat1;

    return new_stat1;
}

};

// t1.cpp, compile this by clang-cl --target=arm64-windows /c /Gy /Gw /O2 /Zi t1.cpp

include "header.h"

_Mbstatet bar();

_Mbstatet foo() { Test test; return test.Init(); }

int main() { foo(); bar(); }

// t2.cpp, compile this by ARM64 MSVC, cl /c /Gy /Gw /O2 /Zi t2.cpp

include "header.h"

_Mbstatet bar() { Test test; return test.Init(); }

Then run lld-link.exe t1.obj t1.obj will report following error:

lld-link: error: misaligned ldr/str offset

Quuxplusone commented 5 years ago

Isn't it also an abi issue that clang-cl produces different alignment than cl.exe? Shouldn't the fix be in clang-cl? (Honest question, not sure.)

Quuxplusone commented 5 years ago

The issue isn't the alignment of the type; it's the alignment of the specific global. The ABI alignment of an global declaration of type _Mbstatet is 4; if there's a strong definition of the global in the same file, the compiler can increase it to 8. clang and MSVC agree here.

The difference is specifically the alignment of a an inline global. On the compiler side, clang assumes it can't increase the alignment, and therefore doesn't increase it. cl assumes it can increase the alignment. (It's possible you could reproduce the issue without using clang at all...)

On an ELF target, it truly can't increase the alignment. As far as I know, there is no linker rule which would require the linker to choose the definition with the higher alignment.

I guess PE-COFF has different rules? If it does, those rules should be documented somewhere... https://docs.microsoft.com/en-us/windows/desktop/debug/pe-format currently just says "Any section that defines the same COMDAT symbol can be linked".

Quuxplusone commented 5 years ago

Thanks Nico and Eli.

MSVC does size (total size, not element size) based alignment for global symbols on ARM64 which is copied from AMD64. It is only causing issue to ARM64 because of alignment requirement by its relocation model.

Is it suggested to code this rule in Clang in ms compatible mode?

Quuxplusone commented 5 years ago

What behavior are you suggesting should be changed in the compiler? Is the compiler supposed to increase the alignment of externally visible globals based on the size, even when optimization is turned off? If so, sure, we can do that in clang. (That seems a little surprising, but I guess it's self-consistent.)

Quuxplusone commented 5 years ago
I meant adding the size based alignment policy to Clang in MSVC compatible mode.

This policy is applied when MSVC back-end reads any global symbol from MSVC
front-end. Below are the rules:

1. The symbol is not explicitly aligned, like __declspec(align(#)).
2. The symbol is for user data, like not for EH.
3. Current alignment is specified by MSVC front-end.
4. If symbol size >= 64 bytes, align to the max of current alignment and 16
bytes.
5. If symbol size >= 8 bytes, align to the max of current alignment and 8 bytes.
6. If symbol size >= 2 bytes, align to the max of current alignment and 2 bytes.

The alignment issue in this bug is adjusted by item 5 above.

The same rules could be applied to Clang, or like in my previous prototype,
just choose the biggest alignment when linking (this has lower risk than adding
above rules to Clang).
Quuxplusone commented 5 years ago
Is that the complete set of compiler rules?  I just briefly tried the
following, and it looks like MSVC assumes it's aligned, despite the rules you
listed:

typedef struct _Mbstatet {
    unsigned char _Wchar[4];
} _Mbstatet;
inline _Mbstatet t;
_Mbstatet f() { return t; }

Otherwise, that description seems to match MSVC, and it makes sense.  Could you
add those rules to the ARM64 ABI document?
Quuxplusone commented 5 years ago

This is the complete list of size based alignment policy in MSVC ARM64 back-end. Maybe there is some other optimization also changes alignment which I am not aware of.

For the specific example, item 6 should be applied so 4 bytes is the final alignment. Do you get some different value?

I'll try to add this to ARM64 ABI doc.

Quuxplusone commented 5 years ago
(In reply to Tom Tan from comment #7)
> For the specific example, item 6 should be applied so 4 bytes is the final
> alignment. Do you get some different value?

Item 6 says the alignment should be 2?  (I assume you meant to have a rule for
4 bytes, but it isn't there.)
Quuxplusone commented 5 years ago
(In reply to Eli Friedman from comment #8)
> (In reply to Tom Tan from comment #7)
> > For the specific example, item 6 should be applied so 4 bytes is the final
> > alignment. Do you get some different value?
>
> Item 6 says the alignment should be 2?  (I assume you meant to have a rule
> for 4 bytes, but it isn't there.)

There is no direct rule for >= 4 bytes, the above rule list is still accurate.

For your specific example, the alignment from front-end is 4, and the final
alignment is max(4, 2), which is still 4.
Quuxplusone commented 5 years ago
(In reply to Tom Tan from comment #9)
> (In reply to Eli Friedman from comment #8)
> > (In reply to Tom Tan from comment #7)
> > > For the specific example, item 6 should be applied so 4 bytes is the final
> > > alignment. Do you get some different value?
> >
> > Item 6 says the alignment should be 2?  (I assume you meant to have a rule
> > for 4 bytes, but it isn't there.)
>
> There is no direct rule for >= 4 bytes, the above rule list is still
> accurate.
>
> For your specific example, the alignment from front-end is 4, and the final
> alignment is max(4, 2), which is still 4.

There is actually an correction to item 6 above, it should be below.

6. If symbol size >= 2 bytes, align to the max of current alignment and 4 bytes.

So the alignment will be 4 bytes for below test code.

typedef struct _Mbstatet {
    unsigned char _Wchar[2];
} _Mbstatet;
inline _Mbstatet t;
_Mbstatet f() { return t; }
Quuxplusone commented 5 years ago

The layout alignment rules are added to ARM64 ABI doc as below (it is under review). Which part in Clang could be updated accordingly?

https://github.com/MicrosoftDocs/cpp-docs/pull/968

Quuxplusone commented 5 years ago

ASTContext::getDeclAlign() is the appropriate place to change, I think. Should be similar to the existing getMinGlobalAlign() codepath.

Quuxplusone commented 5 years ago

This was fixed by https://reviews.llvm.org/rL359744