Open noahfalk opened 2 months ago
@janvorli - is this behavior you've seen before or any thoughts what might be the underlying cause? I assume windbg (or VS) are calling WriteVirtualMemory to modify the memory region that is RX in-proc, but I would have expected the write to get mirrored through mapping rather than breaking the mapping.
fyi @mikelle-rogers @dotnet/dotnet-diag
Tagging subscribers to this area: @tommcdon See info in area-owners.md if you want to be subscribed.
Thanks Noah for the first level of investigations! I'll take a look. I remember in the past when we've enabled W^X, WinDbg was unable to install breakpoints to the double mapped executable memory at all, because their code was refusing to do that in file (paging file in our case) mapped memory. My guts feeling is that the cause of the current issue is that somehow they fixed it by changing the mapping to not to be file mapped, which breaks the double mapping. But let me investigate it further.
@noahfalk It seems the problem is fully on the WinDbg side. I have also disabled caching of RW mappings in the runtime, so each time a RW mapping is requested for an existing RX address, a new call to this function is made, so the OS provides a brand new mapping: https://github.com/dotnet/runtime/blob/54280788590d6012758302d4056aa45720133be2/src/coreclr/minipal/Windows/doublemapping.cpp#L200-L207 The offset it uses is the same offset that the RX mapping is using and yet the changes made through the new RW mapping are not visible through the RX one.
I am not sure what does WinDbg use to set the breakpoint, so I don't have any idea yet on how it could break the double mapping. I wonder if that could somehow turn on copy on write and if that would end up allocating and mapping a different physical page for the region where the breakpoint was placed.
@noahfalk It seems the problem is fully on the WinDbg side
Interesting. @mikelle-rogers mentioned to me that she saw the same behavior repro when using VS debugger but when I tried just now I couldn't get it to repro. Originally I assumed this was something generic any debugger was hitting because of that, but now I've got conflicting info. Let me reach out to some windbg folks to get their thoughts.
@noahfalk, which version of Visual Studio were you using?
VS Version 17.12.0 Preview 1.0
After some further discussion and investigation it looks like Windbg is changing the VM page to copy-on-write as part of writing a breakpoint. This disconnects the RX view of the page from the underlying memory mapping. Windbg team is planning to adjust their breakpoint setting logic to detect these pages and avoid making them copy-on-write.
Description
Using windbg (or VS) to set breakpoints on jitted code causes applications to hit AVs. Running the same app under the debugger without setting breakpoints runs correctly. The underlying issue appears to be some failure in the double memory mapping used by WriteXorExecute when a debugger modifies the RX portion of the address space while setting breakpoints.
Reproduction Steps
Build a test app with this code.
Run the app with windbg as the debugger
When windbg initially breaks in run command "sxe ld coreclr" and continue.
When windbg stops again run command "!bpmd ConsoleApp23 Program.Main" and continue. Replace 'ConsoleApp23 with whatever executable name your test app compiled as.
When the debugger stops at Program.Main, continue once more.
Expected behavior
The app should run to completion successfully
Actual behavior
The app will crash with an AV. The AV callstack looks like this:
The AV occurs because 0x00007ff9`db6b4720 points to zeroed memory.
Regression?
Unknown, but I wouldn't be surprised if the issue was introduced when WriteXorExecute introduced double mapped memory
Known Workarounds
I assume disabling W^X feature would do it but I haven't verified that specifically. Debugging is possible if you carefully avoid placing any breakpoints in the RX memory regions that are double mapped.
Configuration
I reproed this on 9.0.0-preview.7.24405.7, x64, Windows
Other information
I did some debugging into it and here is what I've seen so far.
The code that consisted of zeroed bytes and triggered the AV would instead be assembly that looks like this if you don't set the breakpoint on Main():
This assembly code is generated by DynamicHelpers::CreateHelperArgMove:
After generating the code, execution returns back up to DelayLoad_Helper_Obj which tailcalls the code it just created. If the breakpoint on Main() wasn't set the memory will be correct, if the breakpoint was set it will be zeroed instead.
When CreateHelperArgMove() writes the code bytes into memory, it writes them using a pointer into a memory region with ReadWrite permissions. This memory is supposed to be double mapped causing the same assembly bytes to appear at a different address that has ReadExecute permissions. The tailcall invokes the RX address.
I hypothesized that setting the debugger breakpoint somehow causes the RX page to become unmapped from the RW page and did the following experiment to validate that:
Run the repro code again, however at step (4) use bpmd to set a breakpoint at CombineImpl:
Continue and hit the breakpoint CombineImpl. Now set a breakpoint at coreclr!DynamicHelpers::CreateHelperArgMove and run to it.
In CreateHelperArgMove, step through the code into FindRWBlock and note the RW <-> RX mapping:
Continue stepping back out to DynamicHelpers::CreateHelperArgMove and step through a portion of generating the assembly.
Run !bpmd Program.Main to make windbg set the breakpoint at Main(). Notice that the address of Main() falls in the same RX region used for the assembly stub:
Continue stepping in DynamicHelpers::CreateHelperArgMove to generate the rest of the assembly.
Now you can observe in the disassembly view that the RW portion of memory contains all three instructions:
But the RX view of the memory only contains the instructions that were written before the breakpoint at Main() was set: