Open bo3b opened 8 years ago
Adding another view, this is Inclusive Samples %, which shows that some exceptions are costing a fair amount of time.
I have not looked closely at the code, but I have seen a lot of the out-of-range exceptions when debugging in debug builds. If these are actually expected runtime paths and not genuine exceptions, it would be best to avoid these if at all possible, as exception handling in the MS RTTI runtime are very expensive. Most recommendations I've read are to actually compile with flags that remove exception handlers altogether. That seems too extreme to me, but we should definitely prefer them to be actual exceptional situations/crashes, that we catch and move on.
This is a profile using a freshly compiled texture_hash_rework branch, with identical ShaderFixes and identical d3dx.ini file. Seems to save roughly 0.7% CPU, which is very significant in CPU bound situations.
This profile does not include any airtime, game saved in a funny spot, but includes explosions.
Possible performance regression.
Civ6 benchmark is showing a 30% drop from raw 3D to fixed: No 3Dmigoto or fix: 38 fps Fix installed: 27 fps Plain 1.3.16, no fix, in ship mode: 34 fps
The fix itself is not particularly complicated, does not use regex. I think that historically our 3Dmigoto hit has not been this big. Most likely hit here in Civ6 is the use of a TextureOverride. No, same performance if commented out.
Does anything noteworthy show up in 3DMigoto's built in performance monitor's summary or command list pages?
I just had a look at the d3dx.ini - the TextureOverride doesn't itself cost much performance, but the hash lookup to find it might, particularly if that is happening many times each frame. It's worth trying to comment out the x2 = ps-t0
line instead to see how much of an impact that is having. This should show up in 3DMigoto's performance overlay.
Also, this fix is using the software mouse, which has some known inefficiencies and has shown up in our profiling before (however masterotaku previously reported that it made no fps difference in his test cases despite that). There is room to add caching to the software mouse to dramatically reduce its performance impact.
This fix has shipped with the 3DVision2SBS shader enabled, and this is the old version without performance fixes, and will be having an impact - moreso if the user is using SLI.
Far Cry 4, top of tree as of 3/12/16, including latest fix from 3Dmigoto/fc4.
Tested on SLI 680 (GTX 690) GTX 760 PhysX, i5@3.8GHz, 12G RAM, 720p output in stereo, Driver 361.91, Win7+evilUpdate.
I use the Exclusive Samples% as the most interesting sort, because that shows time actually spent in our code, as opposed to including things we call. Hash calculation for textures is definitely eating some CPU, up from the 0.8% CPU we'd see otherwise. The append_hw at 1% CPU is part of the crc32c calculation.
Possibly useful saved perf report. I think these open in free versions of VS. (Zipped for Github, includes vsps file.)
FC4_1.2.35.zip
The overall module summary:
Individual hot spots in code:
MapTrackResourceHashUpdate
MapUpdateResourceHash
HackerContext::BeforeDraw
HackerContext::SetShader
HackerContext::FrameAnalysisLog
HackerContext::SetShader