Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

lld-link /delayload - first call of a function with bad floating point parameter on x64 #51566

Open Quuxplusone opened 2 years ago

Quuxplusone commented 2 years ago
Bugzilla Link PR52599
Status NEW
Importance P normal
Reported by Thomas Ferrand (thomas.ferrand@hexagon.com)
Reported on 2021-11-24 08:49:48 -0800
Last modified on 2021-11-24 08:49:48 -0800
Version 13.0
Hardware PC Windows NT
CC llvm-bugs@lists.llvm.org
Fixed by commit(s)
Attachments repro-bug-lld.zip (500 bytes, application/x-zip-compressed)
Blocks
Blocked by
See also
Created attachment 25475
Source files to reproduce the bug

When linking a program with a DLL using the /delayload switch, the first call
to a function defined in the DLL will get bad value for (at least one of) the
floating point parameters.

Attached are 2 sources file my_lib.cpp and my_exe.cpp to reproduce the bug.
They should be built as folow:
 - "C:\Program Files\LLVM\bin\clang-cl.exe" my_lib.cpp /link /DLL /OUT:my_dll.dll
 - "C:\Program Files\LLVM\bin\clang-cl.exe" /c my_exe.cpp /OUT:my_exe.obj
 - "C:\Program Files\LLVM\bin\lld-link.exe" my_dll.lib Delayimp.lib /delayload:my_dll.dll my_exe.obj /OUT:my_exe.exe

When running my_exe.exe, the output will be "1 0 3" instead of the expected "1
2 3".

The last step can be replaced with
"C:\Program Files (x86)\Microsoft Visual
Studio\2019\Professional\VC\Tools\MSVC\14.28.29910\bin\Hostx64\x64\link.exe"
my_dll.lib Delayimp.lib /delayload:my_dll.dll my_exe.obj /OUT:my_exe.exe
to use link.exe with the same options or with
"C:\Program Files\LLVM\bin\lld-link.exe" my_dll.lib my_exe.obj /OUT:my_exe.exe
to use lld with /delayload. In both of those cases the resulting executable
will give the expected "1 2 3".

I believe the bug occurs because __delayLoadHelper2 (the function defined in
delayimp.lib that actually loads the DLL and locate the function we want to
call during the first usage) writes into the top of the stack space of its
caller (I don't know why, is it a weird Windows caling convention?) but the
thunk generated by lld doesn't that space.

Specifically, the thunk generated by lld (for x64) looks like this:
push        rcx
push        rdx
push        r8
push        r9
sub         rsp,48h
movdqa      xmmword ptr [rsp],xmm0
movdqa      xmmword ptr [rsp+10h],xmm1
movdqa      xmmword ptr [rsp+20h],xmm2
movdqa      xmmword ptr [rsp+30h],xmm3
mov         rdx,rax
lea         rcx,[__xt_z+28h (01401C9E88h)]
call        __delayLoadHelper2 (01401A3464h)
movdqa      xmm0,xmmword ptr [rsp]
movdqa      xmm1,xmmword ptr [rsp+10h]
movdqa      xmm2,xmmword ptr [rsp+20h]
movdqa      xmm3,xmmword ptr [rsp+30h]
add         rsp,48h
pop         r9
pop         r8
pop         rdx
pop         rcx
jmp         rax

(it allocates space on the stack and uses it to save the register prior to
calling __delayLoadHelper2 and restore them later)

Whereas the thunk generated by link.exe looked like that:
mov         qword ptr [rsp+8],rcx
mov         qword ptr [rsp+10h],rdx
mov         qword ptr [rsp+18h],r8
mov         qword ptr [rsp+20h],r9
sub         rsp,68h
movdqa      xmmword ptr [rsp+20h],xmm0
movdqa      xmmword ptr [rsp+30h],xmm1
movdqa      xmmword ptr [rsp+40h],xmm2
movdqa      xmmword ptr [rsp+50h],xmm3
mov         rdx,rax
lea         rcx,[__DELAY_IMPORT_DESCRIPTOR_my_dll (0140435020h)]
call        __delayLoadHelper2 (01400089C2h)
movdqa      xmm0,xmmword ptr [rsp+20h]
movdqa      xmm1,xmmword ptr [rsp+30h]
movdqa      xmm2,xmmword ptr [rsp+40h]
movdqa      xmm3,xmmword ptr [rsp+50h]
mov         rcx,qword ptr [rsp+70h]
mov         rdx,qword ptr [rsp+78h]
mov         r8,qword ptr [rsp+80h]
mov         r9,qword ptr [rsp+88h]
add         rsp,68h
jmp         __tailMerge_my_dll+77h (01402237B8h)
jmp         rax

It looks very similar but, for some reason, it doesn't save the xmmX register
on the top of the stack like lld, it leave 32 bytes that __delayLoadHelper2 is
free to mess with.

Indeed, (at least on my machine), the first 2 instruction of __delayLoadHelper2
are:
mov         qword ptr [rsp+10h],rbx
mov         qword ptr [rsp+18h],rsi

which, if I'm not mistaken are writting into the stack space where xmm0 and
xmm1 were saved.
Quuxplusone commented 2 years ago

Attached repro-bug-lld.zip (500 bytes, application/x-zip-compressed): Source files to reproduce the bug