NVIDIAGameWorks / rtx-remix

Combined repo for the RTX-Remix runtime
https://www.nvidia.com/en-us/geforce/rtx-remix/
MIT License
1.38k stars 69 forks source link

F.E.A.R DXSO analyze bytecode read() null-deref during CreateVertexShader() callchain. #212

Closed watbulb closed 1 year ago

watbulb commented 1 year ago

Describe the bug

image (~1ms call-time for D3DXCreateEffect)

I'm investigating a race condition here with FEAR that uses FX shaders through d3dx9_27.dll. It appears internally that D3DXCreateEffect is creating a list of device calls for the target effect, and submitting them simultaneously (CreateVertexShader, and CreatePixelShader), which is expected. However this breaks the threadsafe logic that is required to ensure DXVK has enough time to process the shader file (imo), and is causing a race condition to occur in FEAR when D3DXCreateEffect returns with D3D_OK in a ~millisecond scenario; since the commands are sent to a queue via bridge RPC (perhaps there is an issue here with ACK, timeout, cmd dataSize, not sure). Specifically this affects the *pByteCode pointer passed to DXVK through the calls to D3DXCreateEffect and subsequently CreateVertexShader and CreatePixelShader.

Due to bridge-remix not having any exports for D3D9X functions such as D3DXCreateEffect, I assume it is unable to apply the thread-safe/locking logic required to ensure the call has finished processing shaders for this type of call before returning to the game engine code. In FEAR's case, since this call returns fine immediately, it moves onto the next effect shader(s), and replaces the underlying pointer memory backing while DXVK is still in the process of analyzing and re-compiling the previous shader, or possibly even during sending the bytecode data via RPC, as the game is unaware that it was submitted asynchronously [or so it seems as the client is submitting the command to the bridge]. Perhaps this doesn't have the effect I think it does if the entire bytecode is copied across the RPC call from the ptr, that being said D3DXCreateEffect still returns immediately even though the shaders are still processing. if true, perhaps partial data is only being copied before it becomes swapped, and I am leaning towards this being the reason.

With FEAR, we can see that while CreateVertexShader and CreatePixelShader are submitted to the bridge through D3DXCreateEffect, the runtime is sub 1ms for D3DXCreateEffect (as seen in the above screenshot). This results in a crash during read() of the bytecode data received during DXSO analyze(), out of the box. This does not happen with vanilla DXVK, and the game works perfectly, even though the codepaths in DXSO to do so are nearly identical, minus the bridge RPC of course, which is the denominator here.

I still find this pretty strange, since the pre-compiled shaders were compiled using the microsoft D3DXCompileShader() from back then, and should not have used any non-standard or strange instructions outside of the profile that would cause the fairly battle-tested DXSO compiler/disassembler to go haywire, I suppose its a possibility some of the assembly was hand modified or optimized. Its hard to pin this down. I have also tried to set threadsafe mode in bridge to true, and reduce shader compiler threads to no avail. Another possibility is the GetFunction call used to determine the size of the byteCode data is faulty during the send_data forwarding of the Create<X>Shader call due to the underlying bytecode data swapping quickly.

This took quite a long time to debug properly given I don't have access to the FEAR source and had to reverse engineer how it was creating these effects, I'm happy to answer any other questions that you may have, or even share assembly snippets of the game code, I've tried to condense what I have tried already to fix this, but I may have missed some things I have tried more recently in terms of debugging and hooking. Hopefully this was all coherent. I can also give a exported copy of my breakpoints in XML format if required.

Bonus To highlight this de-synchronization, I've setup the following breakpoints with millisecond tick reporting through break actions: its important to note here, each invocation of D3DXCreateEffect must have an associated PS and VS shader within the same bytecode ptr in sequence(VS/PS).

Therefore if 4 calls to D3DXCreateEffect are made, 8 shaders need to be processed, but we see 7 here before the crash during process of a subsequent VS shader. It is believed the shader that causes this is: ShadowBlur_BlurBuffer.fxo initially, but it not limited to that one shader (other shaders exhibit this issue once in-game). Removing the shader from the FEAR shader archive allows the game to proceed to the menu. It should be noted the size of this shader is the largest in bytes that is sent at this point of the loading process.

bp_trace.txt

For non-nvidia readers: PLEASE DO NOT COMMENT THAT SHADERS DO NOT WORK WITH REMIX, I KNOW WHAT I AM DOING

Environment Info:

Attach logs! NvRemixBridge.exe_20230627_075825.dmp.zip NvRemixBridge.log FEAR_d3d9.log.Access.Violation.txt FEAR_d3d9.log.dxso.Callstack.txt FEAR_d3d9.log.dxso.Trace.txt

To Reproduce

  1. Install latest remix, as one normally would into the FEAR game root. 1a. Optionally use the following autoexec.cfg (rename from *.txt) for windowed mode and some other toggles: autoexec.txt
  2. Open F.E.A.R. Platinum
  3. The game will crash on startup.

Expected behaviour

  1. At the very least, remix should hook or throw errors about shader compatibility or issues regarding them, not crash during analyze() reading of the DXSO bytecode transferred by the game during the D3DXCreateEffect->CreateVertexShader call through IPC/RPC.

Hitman: Codename 47 also exhibits this exact same behaviour and problem, see this linked issue: https://github.com/NVIDIAGameWorks/rtx-remix/issues/209

Screenshots N/A

watbulb commented 1 year ago

GuJian 1 has this same issue as well.

nv-ajaus commented 1 year ago

The bridge client wasn't determining the compiled shader length correctly which could be causing this issue. Bridge commit a06bdc0, which maps to Build # 16 on github, is intended to fix the length function. Can you please try it out to see if it helps?

watbulb commented 1 year ago

Fixed by a06bdc0

Thank you.