dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.17k stars 4.72k forks source link

Epilog unwinding code may fail in certain cases #66238

Open namazso opened 2 years ago

namazso commented 2 years ago

In https://github.com/dotnet/runtime/blob/main/src/coreclr/unwinder/amd64/unwinder_amd64.cpp the UnwindEpilogue function may not work on functions where no volatile registers are saved but RFLAGS is. An example of such a function would be KiSetAddressPolicy from ntoskrnl, or you can just make one on your own:

testfn PROC FRAME
    pushfq
    .allocstack 8
    nop
    .beginepilog
    pop rcx
    ret
testfn ENDP

Unwinding this with rip pointing at the pop instruction will fail.

According to the comment in the function this is a valid canonical epilog, however this part: https://github.com/dotnet/runtime/blob/main/src/coreclr/unwinder/amd64/unwinder_amd64.cpp#L515-L516 will skip the 8 byte stack allocation when looking for push opcodes.

Note that this file says Everything below is borrowed from minkernel\ntos\rtl\amd64\exdsptch.c file from Windows a few lines before the bug, so I suspect this might be a Windows bug. Not sure where I should report this, but this repo has the buggy code so I thought it'd be appropriate.

ghost commented 2 years ago

Tagging subscribers to this area: @tommcdon See info in area-owners.md if you want to be subscribed.

Issue Details
In https://github.com/dotnet/runtime/blob/main/src/coreclr/unwinder/amd64/unwinder_amd64.cpp the `UnwindEpilogue` function may not work on functions where no volatile registers are saved but RFLAGS is. An example of such a function would be `KiSetAddressPolicy` from ntoskrnl, or you can just make one on your own: ``` testfn PROC FRAME pushfq .allocstack 8 nop .beginepilog pop rcx ret testfn ENDP ``` Unwinding this with rip pointing at the pop instruction will fail. According to the comment in the function this is a valid canonical epilog, however this part: https://github.com/dotnet/runtime/blob/main/src/coreclr/unwinder/amd64/unwinder_amd64.cpp#L515-L516 will skip the 8 byte stack allocation when looking for push opcodes. Note that this file says `Everything below is borrowed from minkernel\ntos\rtl\amd64\exdsptch.c file from Windows` a few lines before the bug, so I suspect this might be a Windows bug. Not sure where I should report this, but this repo has the buggy code so I thought it'd be appropriate.
Author: namazso
Assignees: -
Labels: `area-Diagnostics-coreclr`, `untriaged`
Milestone: -
tommcdon commented 2 years ago

@hoyosjs

janvorli commented 2 years ago

@namazso The comment says that the case you have mentioned should be handled by the UWOP_ALLOC_SMALL 8: https://github.com/dotnet/runtime/blob/917a0b1bfce3d664fe46587faceb056bca8936f6/src/coreclr/unwinder/amd64/unwinder_amd64.cpp#L586-L600. Have you tried to step through the unwinder code to see where it fails?

namazso commented 2 years ago

@janvorli It would, if the code earlier: https://github.com/dotnet/runtime/blob/917a0b1bfce3d664fe46587faceb056bca8936f6/src/coreclr/unwinder/amd64/unwinder_amd64.cpp#L513-L522 wouldn't already skip it (as there are no push opcodes). it increments FirstPushIndex past the ALLOC_SMALL. Then that value is ised to initialize Index: https://github.com/dotnet/runtime/blob/917a0b1bfce3d664fe46587faceb056bca8936f6/src/coreclr/unwinder/amd64/unwinder_amd64.cpp#L557 so here Index < CountOfCodes fails: https://github.com/dotnet/runtime/blob/917a0b1bfce3d664fe46587faceb056bca8936f6/src/coreclr/unwinder/amd64/unwinder_amd64.cpp#L591

I'll check again tomorrow to make sure though

tommcdon commented 2 years ago

@janvorli is this something we should fix outside of diagnostic scenarios for 7.0?

namazso commented 2 years ago

Oh wow i totally forgot that tomorrow thing. Anyways..

@tommcdon Probably no. In fact this only affects handwritten assembly, or functions that use the __getcallerseflags intrinsic, neither of which happens here. I was emulating prologs and epilogs of ntoskrnl for checking my implementation of a sanity checker that was based on the code here, and ran into this. Simply opened the bug because the code is just not correct, or at the very least not in sync with what its comments tell and MSVC does.