andrei-drexler / ironwail

High-performance QuakeSpasm fork
GNU General Public License v2.0
526 stars 49 forks source link

extreme performance case FYI #35

Closed neogeographica closed 2 years ago

neogeographica commented 2 years ago

Got a performance test case if you're interested.

It's this conversion/episode thing that is in prerelease: https://www.quaddicted.com/forum/viewtopic.php?id=1171

Lots of big environments, and not fully vis'ed yet I think (or vis'ed at all). When launching it these args are recommended: +gl_farclip 128000 +sv_protocol 999 -heapsize 600000

If you use the console to do "map tavistock" and look toward the open area, with Ironwail I get about 5 FPS with r_softemu 3 or 10 FPS with r_softemu 0. Compared to 25 FPS with Quakespasm-Spiked.

Might just be the way it goes because of the quirks of this thing, and perhaps not relevant to most Quake maps, but maybe it's interesting?

By comparison at the start of ad_tears I get 240 FPS with r_softemu 3 and 320 with r_softemu 0.

elementary OS 5.1.7, based on Ubuntu 18.04.6 CPU is a quad-core i7-920 at 2.67 GHz GPU is a GeForce GTX 560 Ti

Ironwail built from 0.4.0 source, just with "make" from inside the Quake subdir.

andrei-drexler commented 2 years ago
This is pretty much what the engine was made for, so I gave it a try. I'm using a Phenom II X4 955 (slightly lower single-threaded perf than the i7-920) running Win 7 (I know, I know) and for this test I've swapped out my GTX 1060 6GB for a Radeon HD 7850 (which is a bit faster than the 560 Ti, but not by a whole lot). The 7850 supports bindless textures, though, which the 560 Ti doesn't, so I've also tested with -nobindless. My results are quite a bit different than yours, and definitely more in line with my expectations: QSS IW -nobindless IW
15 fps 139 fps 182 fps

Wild guess, but maybe the driver is emulating functionality on the CPU. Any chance you could try a proprietary one?

PS the impact r_softemu 3 has is way higher than it should be, too (it's 564 vs 516 on the 7850 at the start of ad_tears).

neogeographica commented 2 years ago

FWIW yeah I'm using proprietary NVidia drivers, the last ones that were good for this card I think ... 390.144.

(condump attached with gl_info output)

condump.txt

andrei-drexler commented 2 years ago

Thanks, but unfortunately there's not much info in there that can actually help, I only used it in my screenshots to show the GPU I was testing on. Luckily, there's opengl.gpuinfo.org - it doesn't have the exact same driver version, but it's close enough. Anyway, one thing that could be causing this would be running out of VRAM - the 560 Ti has 1 GB, and the 7850 I tested on has 2. imagelist reports 588 MB, but many of the textures are not powers of 2, so the actual memory usage could be quite a bit higher than that. The impact shouldn't be anywhere near this drastic, but it's pretty clear that something that "shouldn't" be happening is. Could you try gl_picmip 4 before loading the map to see if that improves things? 4 is really overkill, 1 should be enough, but this will make it more clear if it's VRAM or not.

neogeographica commented 2 years ago

Ah, yeah. Actually 1 doesn't seem to help, 2 maybe a tiny bit, but 3 and 4 push things over the hump into 200+ FPS.

andrei-drexler commented 2 years ago

Hmm... 3 sounds rather extreme, but at least the mystery is solved now. I'll see if I can optimize lightmap memory usage, that's where most of the VRAM difference compared to QSS comes from.

andrei-drexler commented 2 years ago

One more thing that could help: gl_fullbrights 0 (since many of the fullbrights seem to be unintentional, anyway). This might make gl_picmip 2 usable, or it might not have an impact at all, depending on the driver.

neogeographica commented 2 years ago

Even picmip 1 is pretty good w/ fullbrights off, hits around 80-90 FPS.

Let me know if there's anything else you want me to try, otherwise feel free to close this.

garoto commented 2 years ago

piggybacking this issue to mention that gl_fullbrights 0 disables id og sky textures.

andrei-drexler commented 2 years ago

piggybacking this issue to mention that gl_fullbrights 0 disables id og sky textures.

Thanks, fixed in https://github.com/andrei-drexler/ironwail/commit/6cf0d959a254b99ed682281f5c744366db33fb46.

andrei-drexler commented 2 years ago

@neogeographica could you give the latest code a try? Lightmap VRAM usage is much lower now, so picmip 1 should be full-speed, and even 0 might actually be usable.

andrei-drexler commented 2 years ago

Found another ~180 MB between the sofa cushions. I think gl_picmip 0 could be full-speed now.

neogeographica commented 2 years ago

Got a bit of a Linux compilation problem with latest:

Compiling r_world.c
r_world.c: In function ‘R_AddBModelCall’:
r_world.c:469:67: error: ‘FALSE’ undeclared (first use in this function); did you mean ‘FILE’?
  flags = zfix | ((fb != NULL) << 1) | ((r_fullbright_cheatsafe != FALSE) << 2);
                                                                   ^~~~~
                                                                   FILE

Changed it to lowercase false in my build.

So:

With gl_fullbrights 1, gl_picmip 1, r_softemu 3... quite good, minimum triple digits FPS on tavistock start, more usually 150+.

Change to gl_picmip 0, still about 10 FPS when looking toward open area. (Regardless of gl_fullbrights and r_softemu settings.)

BTW I noticed that the gl_picmip value is not written to ironwail.cfg, not sure if that's intentional.

andrei-drexler commented 2 years ago

Got a bit of a Linux compilation problem with latest:

That was indeed supposed to be lowercase, fixed in https://github.com/andrei-drexler/ironwail/commit/518e4121742e9ced60e740a90efbfd9442ec2911. I guess I should probably set up CI.

So:

Thanks for testing this. I'm kind of disappointed with those results, but there's not a whole lot I can do to further reduce VRAM usage. Fullbright textures already take half as much memory as in QS/QSS, and lightmap packing uses a more efficient algorithm (210 vs 264 256x256 blocks). The main difference is that all the lightmap blocks are grouped into a single texture (needed for GPU-driven rendering), which can make it harder for the driver to find a suitable memory block if VRAM is fragmented. Ideally the lightmap would be allocated before all the other smaller textures, but that's not how QS works and changing that would be pretty messy. One thing I can try, though, is to add support for texture compression. This would have the same effect on memory usage as gl_picmip 1, but it would look closer to gl_picmip 0. Loading times would take a hit, though.

And yes, gl_picmip is indeed not saved, same as in QS. I'm guessing it wasn't really meant to be used on a regular basis, just for testing.

andrei-drexler commented 2 years ago

Added basic support for texture compression in https://github.com/andrei-drexler/ironwail/commit/41f0d972680bc973134b39764375dbbdc50dc949 (enabled with gl_compress_textures 1).

neogeographica commented 2 years ago

Hey FYI... I've had to change graphics cards (monitor died -> needed to get a card that had DisplayPort support for new monitor) and my new card has 2GB of VRAM so it's not a good test subject for this anymore. :-) I do think the previous changes helped a bunch. On my end I don't have any reason to continue to keep this issue open; not sure if you want to close it vs. hold it open to discuss any more changes.

andrei-drexler commented 2 years ago

Sorry to hear about the monitor! Yeah, I was still thinking of maybe trying to change the order in which textures are loaded so that the lightmap gets allocated first, which might have helped the memory allocator make better decisions, but I have no way of testing that myself, and I don't think I want to hack the code if I can't confirm it actually helps. Closing this, at least until someone else with a 1 GB card wants to play Peril... Anyway, thanks for the report and for testing these changes.