TASEmulators / BizHawk

BizHawk is a multi-system emulator written in C#. BizHawk provides nice features for casual gamers such as full screen, and joypad support in addition to full rerecording and debugging tools for all system cores.
http://tasvideos.org/BizHawk.html
Other
2.21k stars 385 forks source link

Waterbox & LLVMpipe #3247

Open vadosnaprimer opened 2 years ago

vadosnaprimer commented 2 years ago

Since @nattthebear is non-trivial to catch on irc while I remember about this, I guess we need a ticket.

libTAS uses LLVMpipe to simulate hardware graphics stuff in software and make savestates of that.

Force software rendering

This option forces the OpenGL device driver Mesa 3D to use its software implementation of OpenGL (llvmpipe) to run the game. While this makes the game much slower, it is usually required to be able to use savestates. Indeed, the state of the GPU cannot be easily accessed and stored into savestates by the tool, thus savestates are incomplete and may crash the game. When using Mesa's software implementation, all of the graphics pipeline is done by the CPU and can be stored in the savestate.

You must be using Mesa 3D to be able to use this option, which usually means that you must be using the free driver for your GPU (e.g. nouveau for nvidia GPUs or radeon for ATI/AMD CPUs)

We know that waterbox can't snapshot GPU state, but would it be feasible to incorporate llvmpipe into waterboxhost so cores that require GPU (like Ruffle) could be ported?

nattthebear commented 2 years ago

Yes, this is a good idea. I've thought a bit about Mesa based rendering. LLVMPipe may be too advanced to support in the waterbox, but Mesa has other choices as well. I was always worried about speed; if it's glacial, then is it worth it?

vadosnaprimer commented 2 years ago

I'd love to see speed as the only problem for some things that are otherwise outright impossible to get into hawk.

InfamousKnight commented 2 years ago

From a tas perspective, speed is of no concern as we tend to slow the game down anyways.

If it was like 1 frame every minute then it would be impractical.

nattthebear commented 2 years ago

If we're talking about arbitrary 3d cores like dolphin or citra, the speed very well may be that low. We can't assume anything about orders of magnitude here without research.

CasualPokePlayer commented 2 years ago

Both Dolphin and Citra have software renderers anyways. They would probably be faster than Mesa based rendering (assuming said renderers are optimized, which Dolphin definitely isn't anyways)

Ruffle (as mentioned) is the main thing that would greatly benefit from something like this, given it has no savestates but it doesn't have any true software renderer (although there is a separate question of can Rust code be waterboxed, but I imagine that is probably a yes with some work).

nattthebear commented 2 years ago

Rust is in principle waterboxable. How fast is regular ruffle with mesa?

slamotas commented 2 years ago

Ruffle's speed is game dependent, but in libTAS with forced software rendering, I get about 0.5x-1x speed with Meat Boy and Super Mario Flash can get up to around 2x speed. Both Vulkan and GL run at around the same speed. This is not on a great computer either; I have Ubuntu running natively on an old laptop, it has these specs:

CPU: Intel® Core™ i5-4210M CPU @ 2.60GHz × 4 GPU: Intel® HD Graphics 4600 (HSW GT2)

getCursorsExe commented 2 years ago

LLVMpipe could be used for BlastEm core if you don't want to disable OpenGL?

CasualPokePlayer commented 1 year ago

Something to consider here (from LLVMpipe's page):

Also, the driver is multithreaded to take advantage of multiple CPU cores (up to 32 at this time).

Threads are junk in the box, so whatever speed measures you end up finding with libTAS + forced software rendering is likely going to be way better than what the box will give you.

CasualPokePlayer commented 1 week ago

As done in the melonDS core, you can waterbox code interacting with OpenGL (possibly Vulkan too, but I haven't looked deep enough into it). It does require some very liberal use of ECL_INVISIBLE and some magic to handle the runtime ABI differences (the core is sysv abi but GL function pointers may be ms abi or sysv abi; also yes this is meaning the evil of using function pointers directly instead of having standard callback wrapping is done so some minor potential compromise to determinism since no stack marshalling, this is more needed due to the amount of GL function pointers). In case there are readbacks, expect the obvious GPU specific determinism, and maybe some potential non-determinism with savestates (it is possible to have readbacks for Flash, so Ruffle's case will suffer from there), but if there are no readbacks, this should generally be safe for determinism.

Also, LLVMpipe would internally use a JIT here. On principle a JIT can work in the box, but it generally needs modifications to work in the box at all (as some techniques will not work) and generally will result in a ballooning of state size without modifications. Some JIT designs might not be usable at all within the box here.

nattthebear commented 1 week ago

~If we want JIT-in-the-box, it would be best to start with the simplest smallest JIT-capable core or lib we can find that does something useful. That would let us prove the potential value of the method before going too deep on something that's not clear to provide benefit.~ Edit: Never mind, guess I don't follow this much anymore, didn't know we already had working JIT.

vadosnaprimer commented 1 week ago

@zeromus what do you think?

CasualPokePlayer commented 1 week ago

JIT-in-the-box already exists in ares64. Again, it really depends on the JIT design. If the JIT is just using a small fixed preallocated pool of memory (not just growing arbitrarily), and is not invoking any fastmem tricks (probably not relevant for LLVMpipe anyways, just other emulator JITs), then a JIT could work. "Small" being whatever you're comfortable putting into a savestate (it's 8MiBs total for ares64, 4MiBs split against the 2 JITs), unless I guess if you're comfortable making it invisible, then it could be rather large (JIT invalidation might not work without desyncs for some JITs).