dosbox-staging / dosbox-staging

DOSBox Staging is a modern continuation of DOSBox with advanced features and current development practices.
https://www.dosbox-staging.org/
Other
1.3k stars 155 forks source link

Wrapper-based 3DFX Emulation (Long-term goal) #3040

Closed OpenRift412 closed 12 months ago

OpenRift412 commented 1 year ago

As convenient and accurate as the current implementation of Voodoo emulation is in the dev builds are right now, I think in the name of performance on various specs there should also be the option to use external wrappers from the host system like dgVoodoo and nGlide to get more desirable performance. This would probably introduce a sizeable amount of additional work, but I think it would be a worthwhile endeavor for players wanting to emulate Glide DOS games on somewhat lower-end hardware. I would say this could be somewhat of a long-term goal, since it's not really a pressing issue.

MasterO2 commented 1 year ago

Anything like that would probably be for Staging 0.82 and beyond, given how many new features 0.81 is already going to introduce.

kcgen commented 1 year ago

It was a deliberate decision to disable the glide-to-host OpenGL pass-through feature because it had a lot more problems than the pure-software implementation.

It also requires OpenGL on the host-side, so poses itself as a constraint amidst the move to non-opengl renderers (like Vulkan).

The software implementation "flow" allows the frames the be handled like just all the other DOS video-rendered frames: they can pass through the video capture system, be shaded, scaled, and so on, using a single logical flow.

This is similar to how we handle audio: all output from the emulated audio devices (like fluidsynth and mt32) is passed through the mixer and can be recorded, volume adjusted, stereo swapped, filtered, and so on.

kcgen commented 1 year ago

Long term, I hope members of the community will see it as a fruitful area to expore more pure-software performance improvements, like SIMD (using lib Highway) and threading, are technology-agnostic and portable (without being ties to a specific rendering technology).

Grounded0 commented 1 year ago

Can you remove [text] from the title. Makes it difficult to read and does not form to our ticket style. Thanks.

OpenRift412 commented 1 year ago

It was a deliberate decision to disable the glide-to-host OpenGL pass-through feature because it had a lot more problems than the pure-software implementation.

It also requires OpenGL on the host-side, so poses itself as a constraint amidst the move to non-opengl renderers (like Vulkan).

The software implementation "flow" allows the frames the be handled like just all the other DOS video-rendered frames: they can pass through the video capture system, be shaded, scaled, and so on, using a single logical flow.

This is similar to how we handle audio: all output from the emulated audio devices (like fluidsynth and mt32) is passed through the mixer and can be recorded, volume adjusted, stereo swapped, filtered, and so on.

Long term, I hope members of the community will see it as a fruitful area to expore more pure-software performance improvements, like SIMD (using lib Highway) and threading, are technology-agnostic and portable (without being ties to a specific rendering technology).

Makes sense, well hopefully we do see some more optimizations in the future, because I do like how faithful the current implementation is. :)

johnnovak commented 1 year ago

If OpenGL passthrough won't allow compositing the OSD on top of the output, that's a showstopper. Also, Voodoo is a small niche in DOS gaming and OpenGL passthrough is a niche within a niche. So this will only happen if 1) you pay someone to do the work and do it cleanly so the dev team accepts it, 2) a new guy joins the dev team who is personally interested in pouring hundreds of hours into this. Like I was to do double-scanning right; useful, but admittedly a niche thing, so it only happens if a maintainer has a very strong interest in the topic.

weirddan455 commented 1 year ago

I'd like to see actual performance numbers to see if this is worthwhile. Game compatibility is also a concern. DOSBox-X supports this feature. Maybe do some benchmarking and report back. There's theoretically big performances gains to be had but in practice it may not work out so well.

This still won't guarantee anyone will pick it up but if there's not at least, say a 2x performance gain, it's almost certainly going to be more trouble than it's worth.

Lastly, another issue I'll point out is that many of the Glide wrappers are unmaintained, closed source, and Windows only (or some combination of the above). This was discussed in a previous issue regarding Voodoo.

kcgen commented 1 year ago

@schellingb added SSE2 and threaded triangle works (4 workers), which was a sufficient boost to the software implementation to get it running relatively smooth on semi-modern desktop hardware (even my old Skylake runs most stuff at 40+ FPS, which is great).

if there's not at least, say a 2x performance gain, it's almost certainly going to be more trouble than it's worth.

Yup - I think that's on the order of what the above did. It was enough to crest the "pain point" for typical hardware.

I haven't done profiling to see if there's any low-hanging fruit left, but with processors like Zen 4 and Apple's ARM64 really raising the bar (poor Intel, after dialing up wattage they still are left in the dust; time to do some work, boys! :smile:), we could very well be headed toward a "works fine for me, why bother with further improvements" scenario.

As an example, @FeralChild64's AMD system was so fast that he was maxing out 60 fps all the time (I believe it's a Zen 3 system?).

Another 2x or 4x boost could get SBC's like the Pi 4 and 5 cranking out very playable FPS, too, but I also don't want to pollute the code with ARM-specific NEON stuff; so I'd rather we go the lib Highway route and solve it once-and-for all. But yeah: someone really interested is going to have to dig in!

another issue I'll point out is that many of the Glide wrappers are unmaintained, closed source, and Windows only (or some combination of the above). This was discussed in a previous issue regarding Voodoo.

This Windows-only, closed-source, and unmaintained nature of all those wrappers has plagued the implementation of Voodoo on DOSBox in prior years "Works for me, just run windows!" :face_with_head_bandage:

Thankfully software emulation rendered those contraptions moot; good riddance :wastebasket: !

johnnovak commented 1 year ago

My view: I'd consider a 2x speedup small potato. I'd expect a 10x speedup at minimum with a passthrough approach to make it worthwhile (say 6 FPS vs 60 FPS; pretty big difference). If current(ish) hardware can run authentic Voodoo emulation fast enough, we're lucky and can forget about the whole passthrough stuff.

Still very theoretical as no-one cares much about those handful of Voodoo games in the dev team 😄 I theoretically care about Archimedean Dynasty, Tomb Raider, and Redguard of all Voodoo DOS games and that's it, but even with these games I'm most likely perfectly fine with software rendering.

So yeah, this work needs some absolute Voodoo nut 😄

shermp commented 1 year ago

I have a sneaking suspicion it's alpha/transparency effects that murder the software emulation performance. I can reliably tank performance in Descent II just by bumping into the guide bot, which releases a shower of sparks.

kcgen commented 1 year ago

nice repro case, @shermp! Gotta grab a profile across a lot of spark showering, and see where all that CPU time is going.

shermp commented 1 year ago

Here's a short video showing the slowdown. Google drive link because it's too big to upload to GH.

https://drive.google.com/file/d/1z7L4h9cJMeUMFfI1cnOM180FbR3vHI7C/view?usp=share_link

GranMinigun commented 1 year ago

Please don't confuse the removed OpenGL backend in the Voodoo emulation patch with a pass-through to Glide wrappers, which is the request here.

And yes, overdraw caused by alpha transparency blending can hit hard, even when rendered in hardware.

weirddan455 commented 1 year ago

Please don't confuse the removed OpenGL backend in the Voodoo emulation patch with a pass-through to Glide wrappers, which is the request here.

Do you know why this was removed? I know N64 emulators for example do low level emulation of the video card but use OpenGL or Direct3D to do the actual drawing on the host for performance reasons. It seems like that may be the best option.

It also requires OpenGL on the host-side, so poses itself as a constraint amidst the move to non-opengl renderers (like Vulkan).

I kind of hate this trend, although really Apple is the only one declaring OpenGL deprecated. They don't even support Vulkan officially. They want you to use Metal. OpenGL is the only API that both works cross-platform and supports older hardware. You're not going to get Vulkan on a Raspberry Pi for example.

The main selling point of Vulkan is performance but that's mostly only realized for complex AAA style 3D games. It also comes at a cost of being a lower level, harder to program for, API. For our purposes, OpenGL is just fine. Plus we already require it for the CRT shaders.

I will note that I'm interested in SDL3's new GPU API as that develops. That will allow us to write shaders in a platform-agnostic way as it will target multiple APIs on the backend (OpenGL, Direct3D, Vulkan, Metal). That may end up being a perfect fit for Dosbox in the future.

GranMinigun commented 1 year ago

It was removed because it was incredibly hacky and buggy, creating its own window and forcing rendering resolution to its size no matter what. No shaders, too. That's from the top of my head.

I know N64 emulators for example do low level emulation of the video card but use OpenGL or Direct3D to do the actual drawing on the host for performance reasons.

You might want to refresh the data, as currently the most accurate emulators use one specific low-level Vulkan plugin, which utilizes compute shaders to do its work. For HLE graphics, there's still GlideN64 (which has nothing to do with 3dfx's Glide API, mind you).

Anyway. I don't really see any reason for parallel low-level emulation and high-level drawing. IBM PC emulation is vastly different to fixed hardware console emulation. And unlike with most of the other hardware, games do not touch Voodoo directly, as far as I'm aware. All talking is done via Glide API. Same thing for Rendition Speedy3D (used in vQuake). If one cares for precise slowdowns, they're welcome to use low-level software emulation of the hardware. If one's rather have a decent performance, whether with accurate visuals or not, then pass-through is the way to go, as wrappers could be developed independently and used not just in DOSBox.

You're not going to get Vulkan on a Raspberry Pi for example.

Wrong. RPi has better support for Vulkan (1.2) than for GL (desktop is limited to 2.1, ES is limited to 3.1).

I will note that I'm interested in SDL3's new GPU API as that develops.

First of all, if it even gets an OpenGL backend, it will likely be just desktop 4.3, as that's when compute shaders were introduced. DOSBox doesn't even really need all of 2.1 functionality, and I'd rather keep compatibility with Intel HD 3000/4000 for now. And no word of GL ES.

Second, as of now, the plan is to adopt RetroArch's Slang shaders. Don't get me wrong, SDL GPU does sound interesting, but the situation with shaders is problematic: there's spirv-cross to convert between GL, GL ES, and Vulkan, which makes Slang feasible in the first place; but for SDL's API, one would have to specifically port their shaders.

Of course, nothing's set in stone, but that's how I see the current overall situation.

johnnovak commented 1 year ago

You see the situation correctly @GranMinigun.

I wouldn't touch SDL's GPU attempts with a ten feet pole. Immature, new tech that exists for no good reason. All they can hope for is to play catch-up.

kjliew commented 12 months ago

It's estimated to be more than 2x, but it really depends if any 3Dfx DOS games really need that kind of performance. For 30 FPS gameplay with 65W+ desktop-class CPU, @schellingb multi-threaded VOODOO implementation is really good, much better than other "Trash"Boxes or "JUNK_PC"em fantastic Voodoo 3/Banshee emulation. 🤣

Here's the performance measurement data for reference. Take it with a grain of salt as the workload isn't realistic for most 3Dfx DOS games by using Quake 1.06 shareware with DOS source-port QDOSFX that support DOS 3Dfx rendering and GLQuake 0.97 for wrapper-based pass-through on Win98 VM with QEMU TCG.

QDOSFX command line: qdosfx -width 800 -height 600 Configured to viewsize 120 then timedemo demo1 GLQuake can do everything one-shot from command line. glquake -window -width 1024 +timedemo demo1 +viewsize 120

DOSBox SVN (Pure CPU-multithreaded) QDOSFX = 39.9 fps DOSBox SVN (kekko VOODOO OpenGL) QDOSFX = 116 fps QEMU TCG (qemu-3dfx Glide pass-through) GLQUAKE = 201 fps

Host system is Windows 11 22H2, Ryzen 5 7535U (20W TDP) thin-and-light laptop.

kcgen commented 12 months ago

Here's the performance measurement data for reference.

I appreciate the benchmarks, but not the pathetic name calling:

2023-10-29_22-09

It's rare to have to refer to our code of conduct, please read it:

https://github.com/dosbox-staging/dosbox-staging/blob/main/CODE_OF_CONDUCT.md

I welcome your involvement in the future. Regards.

Burrito78 commented 12 months ago

From my limited understanding of the subject i can see the issues of this from miles away.

-Bugs/Broken Drivers/Changes in Host OS/Upgrades will bite us in the behind long term, wrapper based approaches are a mid-term solution that is destined to die somewhere down the road. -Software emulation is slow, robust, deterministic and will get automatically faster with faster hardware, it's here to stay.

I'm closing this as not planned for now. If someone from the team thinks otherwise, don't hesitate to reopen!

johnnovak commented 12 months ago

From my limited understanding of the subject i can see the issues of this from miles away.

-Bugs/Broken Drivers/Changes in Host OS/Upgrades will bite us in the behind long term, wrapper based approaches are a mid-term solution that is destined to die somewhere down the road. -Software emulation is slow, robust, deterministic and will get automatically faster with faster hardware, it's here to stay.

I'm closing this as not planned for now. If someone from the team thinks otherwise, don't hesitate to reopen!

100%

Grounded0 commented 12 months ago

We can multithread this later to really make it fly since its 2023. I mean 2 to 4 worker threads.