RobertBeckebans / RBDOOM-3-BFG

Doom 3 BFG Edition source port with updated DX12 / Vulkan renderer and modern game engine features
https://www.moddb.com/mods/rbdoom-3-bfg
GNU General Public License v3.0
1.38k stars 247 forks source link

Less GPU-demanding soft shadows #211

Closed escheffel closed 9 years ago

escheffel commented 9 years ago

Hi there,

I suppose my request is a feature enhancement request. Some of us like to play Doom3 on integrated GPU hardware. For example I am playing it on my A8-7100 R5 integrated Kaveri GPU (which only has 256 cores at about 500Mhz each) and frame rates are very smooth without soft shadows.

From the README I could take that soft shadows are implemented using taps(Samples?) where you have defaulted them to 16 to make shadows more pretty on modern 2014 hardware. Could this be included as an option where users can choose from how many taps to use?

In the README it also states that using only one tap quadruples the frame rate when using soft shadows. I am looking for such an option, because with the current implementation I only get 23FPS on average which is a bit too low for my liking.

Btw, I am aware of the "r_ShadowMapSamples" console option and tried to change this inside the game to different values. But it seem as if this isn't really affecting anything, neither visually nor in terms of altered frame rate performance - which is odd. Thanks - Eric

mclark4386 commented 9 years ago

It looks like only 16, 4 and 1 are used values. Have you try those exact values?

RobertBeckebans commented 9 years ago

r_shadowMapSamples does not work anymore. The shader in base/renderprogs/interactionSM.pixel uses a fixed size Poisson disc algorithm with 12 taps. It is a similar approach as described by some CryEngine 3 docs.

It is described in the pages 24-25: http://de.slideshare.net/TiagoAlexSousa/secrets-of-cryengine-3-graphics-technology

I tried many different approaches. The initial approach was a filter similar to Rage with 16 taps which caused 32 texture lookups. The readme is outdated. The current implementation requires 13 texture lookups with the same image quality. Anything below looked worse than stencil shadows so it is not an option to reduce the number of taps.

It is no bottleneck to fill the depth buffers for the shadow maps (it ran with 400 fps on my GTX 660 Ti) but it is still expensive to filter the shadow maps for each light interaction. Doom 3 still has more dynamic shadowing lights than many other games. It is that simple.

nbohr1more commented 9 years ago

We are planning on using "Penumbra Wedges" in an upcoming Dark Mod release. Have you played with that technique?

http://www.terathon.com/gdc05_lengyel.pdf

escheffel commented 9 years ago

Thanks for all of the comments so far. I just wanted to be more precise again about the situation I am facing. I currently own 2 Lenovo laptops, one 3 years old with a dedicated 2GB DDR3 gt550m(with 96 Fermi shader cores at 1GHz+ each) with a 128bit memory bus and an Intel i3(Lenovo Y470), and a very new one with an AMD A8-7100 APU with an integrated R5 GPU(256 shader cores at about 500Mhz each) which can use system memory for VRAM using a 128bit dual channel bus(Lenovo E555). The CPU is a AMD Steamroller quad-core architecture which is capped at 1.8GHz for each core.

When running Doom3 in Linux and starting a game from scratch in Mars City I focus my attention on the scene right at the beginning where the character receives a bio-scan in the entry chamber. In this scene light is filtered from the top of the ceiling through 2 rotating ventilator shafts which creates a moving shadow on the floor, which looks quite complex for soft shadows. My 3 year old Laptop with the "old" GPU plays this scene smoothly with just about 32FPS (RB:12.5, GPU:24), but my newer integrated GPU AMD APU grinds to a halt in this scene with about 12FPS(RB:55, GPU:75).

rbdoom-3-bfg-20150107-160609-001

I find this quite puzzling. When launching toy benchmarks such as "glxgears" in Linux my A8-7100 APU is substantially faster than my older NVidia-based Laptop, but when it comes to rendering many complex soft shadow scenes in Doom3 the newer AMD-APU Laptop cannot keep up, which I think should not be the case. In fact the AMD A8-7100 performs awful with soft shadows switched on.

I have been wondering to myself... could it (also) be that rendering soft shadows is also CPU-bound? Is it in fact the case that the AMD CPU capped at 1.8GHz might at least partially be the culprit for seeing the unusually low frame rate?(see following comment box) It may be useful to know that when both laptops run Doom3 without soft shadows, the newer Kaveri-based Laptop is about equally fast as the older Fermi NVidia-based dedicated GPU. Both laptops are tested running Doom3 using full-screen mode in 1366x768 and Anti-aliasing switched completely off.

Anyone might know why soft shadows cause this massive decline in performance on the A8-7100? Maybe the answer is however simply that the fewer 96 Fermi cores outpace the 256 AMD cores due to their substantially higher clock speed. Still puzzling though that the NVidia GPU is almost 3 times as fast as the Kaveri R5 in complex dynamic shadow scenes.

escheffel commented 9 years ago

The plot thickens and turns even more peculiar. I just launched Doom3 on a 64-bit Linux Intel i7 workstation machine equipped with a dedicated AMD GPU R9 280X with 3GB of DDR5 VRAM. This GPU is a re-branded Tahiti HD7970 with 2048 shader cores. So still quite a beast by many standards. When launching into Mars City (at 1600x900 without any anti-aliasing) frame rates are unsurprisingly consistent at 120FPS (locked).

But in the same bio-scan chamber scene described in my earlier post with the dynamic shadows cast from the 2 ventilator shafts on the ceiling, the frame rate drops to exactly 55FPS(RB:17.2, GPU:17.7). Surely this must be a driver bug in the AMD set up!! This machine configuration is a beast by Doom3's hardware requirement standards - the frame rate should never drop by that much and it should also not be that taxing on the RB and GPU... (P.S. I have disabled Catalyst A.I. in the GPU settings).

Why are soft shadows so punishing on AMD and so forgiving on NVidia GPU hardware, irrespective of whether it is dedicated or integrated? (I am using the latest AMD Catalyst Omega 14.12 driver)

escheffel commented 9 years ago

I also tried to alter the source code myself and re-compiled it with the random Poisson discs using only 6 samples instead of 12. This indeed decreases the quality of the shadows noticeably, they become very noisy and look a bit like a cloud of black ants crawling on walls. Performance-wise there is no visible improvement on my Kaveri A8-7100 compared to the 12-sample implementation... not even one iota. rbdoom-3-bfg-20150107-200108-001

RobertBeckebans commented 9 years ago

Maybe it is the dependent texture fetch above for the texcoord offset.

float4 jitterTC = ( fragment.position * rpScreenCorrectionFactor ) + rpJitterTexOffset; float4 random = tex2D( samp6, jitterTC.xy ) * PI;

Does the performance improve if you replace

float4 random = tex2D( samp6, jitterTC.xy ) * PI;

with

float4 random = fragment.position;

?

2015-01-07 13:17 GMT+01:00 Eric Scheffel notifications@github.com:

I also tried to alter the source code myself and re-compiled it with the random Poisson discs using only 6 samples instead of 12. This indeed decreases the quality of the shadows noticeably, they become very noisy and look a bit like a cloud of black ants crawling on walls. Performance-wise there is no visible improvement on my Kaveri A8-7100 compared to the 12-sample implementation... not even one iota.

— Reply to this email directly or view it on GitHub https://github.com/RobertBeckebans/RBDOOM-3-BFG/issues/211#issuecomment-69014576 .

escheffel commented 9 years ago

Hi there, Thanks for the suggestion. I tried this and re-compiled but all I get is somewhat distorted (shifted and banded) shadows and no perceptible improvement in performance. It just beats me that an aging laptop-based mobile GPU from NVidia manages to pump out decent FPS with soft shadows enabled, but that AMD GPUs take such a big performance hit. Perhaps I should try to revert the driver to some earlier versions.

nbohr1more commented 9 years ago

Yes, even vanilla Doom 3 has puzzling performance issues in current AMD drivers. We've been banging on their support team for a few months now about this...

escheffel commented 9 years ago

If vanilla Doom3 means Doom3 with old-school stencil shadows enabled only then I have no reasons to complain with using the most recent Omega AMD GPU drivers. On my A8-7100 APU I get decent average frame rates of 50FPS or thereabouts (in some cases the FPS rises above 100) - not shabby for an IGP solution! It's just the added soft shadows which prove to be massive performance killers...

RobertBeckebans commented 9 years ago

The major difference in the bioscanner scene is that it contains alpha tested materials (ventilator shafts) that cause additional shadows through the flag r_forceShadowMapsOnAlphaTestedSurfaces 1. You can try r_forceShadowMapsOnAlphaTestedSurfaces 0 to create only shadows for all interactions where stencil shadows would be created. Maybe the AMD driver has a problem with alpha tests during the shadowmap depth buffer filling.

This stuff is really difficult to track down. I usually have 2-5 milliseconds GPU time with shadow mapping and 2x MSAA enabled.

2015-01-07 16:46 GMT+01:00 Eric Scheffel notifications@github.com:

If vanilla Doom3 means Doom3 with old-school stencil shadows enabled only then I have no reasons to complain. On my A8-7100 APU I get decent average frame rates of 50FPS or thereabouts. It's just the added soft shadows which prove to be massive performance killers...

— Reply to this email directly or view it on GitHub https://github.com/RobertBeckebans/RBDOOM-3-BFG/issues/211#issuecomment-69040319 .

escheffel commented 9 years ago

Thanks for the suggestion. I should not have focused too narrowly on the bio scanner scene in my earlier descriptions. Soft shadow performance on my AMD hardware is actually comparatively poor everywhere throughout, be it alpha surfaces or not. For instance the L-shaped corridor one needs to pass through on the way to marine HQ with the one engineer working on a console also causes FPS performance to slow down dramatically. This is the image in my earlier comment about poor shadow quality (crawling ants) when switching to 6-sample discs. I tried your suggested command in the bio scanner scene and the performance remains the same (i.e. bad and slow). I would also not want to switch off that parameter in any case because soft shadows cast through translucent materials are a particularly pretty and realistic feature of your modification! The only reasonable explanation I can think of is that AMD's on-the-fly GLSL shader language compiler does not optimize as aggressively as NVidia's. In some forums people do say something to this effect.

chasetheswift commented 9 years ago

Could it be as simple as poor amd linux drivers. I don't know if it would be convenient but testing the same machine in the same scene but in windows. Also on an amd apu, the cpu and gpu on one die like the intel integrated series, so an older dedicated graphics chip will almost always have an advantage.

romulus2k4 commented 9 years ago

There is definitely an issue with soft shadows and AMD drivers, I have been experiencing sudden frame drops right in front of doors for no explainable reasons on a HD 7950. It is definitely an AMD driver or an optimization issue.

escheffel commented 9 years ago

Anyways, without wanting to be a "spoilsport", I recently discovered the extensive modding community which has sprung up around the original Doom3 non-BFG edition, and some of the graphics mods are simply amazing and also work on the Linux version, although they are not always as well supported on that platform as on Windows.

At the moment I am using a combination of the "Wulfen" and a couple of other high resolution texture packs combined with the modified shader routines sliced out of the "Sikkmod" mod (which allow very enhanced lighting and especially parallax occlusion mapping lending an amazing plasticity and depth to the high resolution textures which are created internally with depth offset information.

In total the shader packages allow for a couple of different simulated height displacement effects to choose from: Parallax mapping (cheapest on the GPU thus less taxing, but causes "texture watering" at times - an annoying visual distortion), Parallax occlusion mapping (best and most accurate visual improvement, but because use ray-tracing it is quite expensive in the GPU), as well as Relief mapping and a modified parallax mapping called "Steep parallax mapping", both of which are also great but almost as taxing on the GPU as parallax occlusion mapping.

The bottom line is this. Once you have the original Doom3 and install a combination of these mods, especially slightly advanced ARB or GLSL shaders coupled with high-resolution textures, the graphical quality of Doom3 makes a massive and very, very enjoyable leap without crushing frame rates, not even on AMD hardware. At the moment, the subtle visual tweaks of the modified BFG edition offered on this site are very minute by comparison, they get nowhere near the graphical improvements one can attain when modding the original Doom3, including on Linux.

BielBdeLuna commented 9 years ago

yes, in time we might get those shaders to work on the BFG edition, but what the BFG has that vanilla doesn't is the multi-threaded renderer

romulus2k4 commented 9 years ago

Slightly off topic but @escheffel

image

image

How do you like these? Thanks to the hard work you folks have put in, I was able to accomplish these, with a little help from existing mods.

escheffel commented 9 years ago

@romulus2k4 these screenshots look very nice indeed and very much put some of my earlier comments about the Doom3 BFG edition to shame! Where can I download the mod files which would allow me to enjoy this lovely graphics spectacle? These look like high-res textures, but I suspect that because shaders have not been modified, parallax occlusion mapping is still off the table for now? @BielBdeLuna I actually did not know that the BFG edition uses multi-threaded rendering...

RobertBeckebans commented 9 years ago

I did a few fixes for modding support today. You might have a look into the git log. There is also a binary for testing: https://www.dropbox.com/s/8ldovg9mkftg1hl/RBDOOM-3-BFG-1.0.3-win32-20150225-git-64a12c1-preview.7z?dl=0

romulus2k4 commented 9 years ago

@escheffel here you go:

http://www.kot-in-action.com/forum/viewtopic.php?f=28&t=4436

You may run into the occasional crash every once in awhile with the Mega Mod, just post the exact crash behavior here and I will try to provide a fix. Simply deleting the monster_demon_archvile.def from the mod's def folder should fix most of the problems. If it wasn't for @motorsep I would never have been interested in Modding at all. For the mod screenies posted above, you will have to wait a bit. Those are WIP and I intend to release them soon, those are very simple mods from vanilla DOOM 3 though, I just updated them to fully support BFG Edition. Again, many thanks to @RobertBeckebans and @motorsep

BielBdeLuna commented 9 years ago

this green smoke in the second screen-shot looks great! is it the fruit of coincidence or is a cool smoke texture?

anyway it would be great to include the functionality of a radar in the HUD (or in the weapon as in that old mod) as an available functionality for further modding.

romulus2k4 commented 9 years ago

@BielBdeLuna the green plasma smoke you see is from Phrozo's green plasma mod, all I did was update the .def files to work with D3BFG.

romulus2k4 commented 9 years ago

On topic, this is my configuration:

Core i5 3470 @3.8 GHz; Gigabyte GA-Z77-D3H Motherboard; Corsair Vengeance 1600 MHz 4GB RAM with XMP profile selected; Sapphire HD 7950 3GB Vapor-X; Creative Sound Blaster Live! 24 Bit PCI Sound Card; Thermaltake Smart 650W Modular PSU; Western Digital 1TB HDD; Philips 191EL 19' Monitor (1366x768); Windows 8.1 up to date; Currently using AMD Catalyst Omega Drivers;

Whenever I am playing the game with Soft Shadows on, I get speed drops for no good reasons at all, even at a resolution of 1366x768! I don't get it, is it AMD's drivers that are to be blamed? These are some of the places where mostly the speed drops occur while using Soft Shadows:

rbdoom-3-bfg-20150226-124649-001

rbdoom-3-bfg-20150226-124655-002

rbdoom-3-bfg-20150226-124836-003

rbdoom-3-bfg-20150226-124941-004

rbdoom-3-bfg-20150226-124953-005

rbdoom-3-bfg-20150226-125029-006

rbdoom-3-bfg-20150226-125052-007

rbdoom-3-bfg-20150226-125147-008

rbdoom-3-bfg-20150226-125201-009

rbdoom-3-bfg-20150226-125213-010

rbdoom-3-bfg-20150226-125242-011

rbdoom-3-bfg-20150226-125554-012

The images above were slightly compressed using File Optimizer for shorter upload times (working with a less than 256 kbps internet connection isn't easy) and I am sorry if you find the number of images disturbing.

But the problem still remains, it is impossible to have a smooth gameplay experience using AMD cards while leaving the soft shadows on regardless of the model of AMD graphics card used.Whenever there's a door being opened, the framerate just drops (add 3 frames further from the drops seen on the screenies if you have the flashlight on) I would also like to point out that a HD 7950 is much faster than a GTX 660 Ti OC (assuming the game is properly optimized.) And also, the HD 7950 has a higher frame buffer (3GB of VRAM) which shouldn't be the cause of a performance drop. @RobertBeckebans Any further thoughts on the matter?

escheffel commented 9 years ago

I am now 95% certain that the blame for poor Linux-OpenGL performance of AMD GPUs can be pinned on the drivers, which AMD never fully optimized for Linux environment. This affects practically ALL games running on AMD GPUs in Linux. Check for example:

http://www.phoronix.com/scan.php?page=article&item=metro-redux-22gpus&num=1

which shows performance of a large number of GPUs in Linux when running the Metro Redux games. AMD cards do quite bad here (with Physx DISABLED of course) compared to the performance levels they can tap into under Windows. We can only hope that because Steam and Steam OS increasingly place their focus on Linux, that AMD will be forced to optimize their Linux drivers!

RobertBeckebans commented 9 years ago

I have pretty high hopes that a Vulkan renderer backend will solve all performance issues.

2015-04-18 16:00 GMT+02:00 Eric Scheffel notifications@github.com:

Closed #211 https://github.com/RobertBeckebans/RBDOOM-3-BFG/issues/211.

— Reply to this email directly or view it on GitHub https://github.com/RobertBeckebans/RBDOOM-3-BFG/issues/211#event-284473858 .

BielBdeLuna commented 9 years ago

that's great to hear.

romulus2k4 commented 9 years ago

I know this issue is closed, and I don't have much of an idea about what I am saying, but.. is there any possibility of implementing an AMD friendly shadowing technique like CHS, and will it benefit performance on AMD GPUs?

escheffel commented 9 years ago

@romulus2k4 I would not be surprised if this was possible (what are CHS?). Still the main problem appears to remain poor driver optimization. Even an oldish Tahiti-based R9 280X (I own one) hardware-wise continues to be quite beefy after all these years, and for example in double-precision OpenCL compute tasks in Linux it is very fast and easily kills NVIDIA Maxwell GPUs (because of their crippled DP performance). There is nothing wrong with the hardware as such, it is just a shame that Linux OpenGL drivers cannot fully tap the hardware limits of those AMD cards. Apparently the same is not true for the Windows OpenGL support which is better from what I have been hearing.

I never use Windows (not in 10 years!), neither for my professional work nor occasional gaming. Because I do a lot of programming in scientific OpenCL compute tasks both in SP and DP floating point arithmetic, I employ an AMD R9 280X for that. But for gaming I use a new and shiny NVIDIA Gtx970. At the moment it is just the only acceptable way in Linux if you want to play games such as Dying Light or Metro Last Light Redux and if you use Steam.

At the moment the ONLY reason why I continue to own and use a AMD GPU in Linux is to carry out high-precision DP compute tasks in OpenCL (at 1 TFLOPS, not bad!), for which Tahiti GPUs remain VERY competitive even after 3 years in the market. For SP arithmetic NVIDIA Maxwells are just as good or better. I hope that AMD will continue to honour its devout OpenCL compute "fan-base" by making the Fiji 390 a DP OpenCL-compute monster as well, in spite of the energy-consumption implications.

NVIDIA Maxwells are a complete joke in DP arithmetic, because throughput has been crippled to 1/128 of SP!! With Kepler this was 1/3! With the latest Maxwell generation NVIDIA has decided to go down the "gaming-only" path which only requires massive SP throughput. Maybe they do this because they anticipate that in a few years the high-precision OpenCL scientific compute community will be tempted over to Intel and their "Xeon Phi" co-processors with 60+ cores at the moment, which soon will reach 100+ cores.

![Uploading intel.web.480.270.png…]()