db48x / emularity

easily embed emulators
GNU General Public License v3.0
614 stars 110 forks source link

Architecture Emularity to support filter processing on emulator framebuffers #74

Open mdrejhon opened 3 years ago

mdrejhon commented 3 years ago

Historically, Emularity has outsourced filter processing to the individual emulators.

However, Emularity is a metaphorical equivalent of an emulated monitor. It allows a web browser to "become the emulated display for emulators".

Thus, Emularity theoretically takes upon responsibility of a monitor's motherboard and its processing algorithms. This github suggestion treats Emularity accordingly in this perspective.

A real display/monitor/television is able to process refresh cycles for things like color processing or scaling. A real display/monitor/television does things without the original game or video source knowing.

Thus, for an emulated display, this oneway capability is easily blackboxable and siloable. Thus, Emularity (emulating a monitor) can add filter-processing layer without needing any underlying modifications to individual emulators. It has lots of long-term pros:

Add Emulator Framebuffer Processing Filter Hooks To Emularity

This is a natural improvement for Emularity, given Emularity allows a web browser to be the emulated display for an emulator. Thusly, Emularity acts as a display conduit, with display responsibilities. This github improves and future-proofs the display responsibilties of Emularity in the light of increasingly-GPU-capable web visitors.

The Good News: It's a bit easier than thought: Hidden "CANVAS"

As the author of TestUFO, I have extensive Canvas2D knowledge. And Emularity also uses the HTML5 <CANVA> element. Which is brilliant news: One can render to a memory bitmap instead of the screen. Or point it to a Bitmap object: You just point to a different context -- the emulator renderer doesn't even need to know it's no longer drawing to a visible browser element, but essentially an offscreen element that's mapped directly to a high speed bitmap that can be processed really fast in JavaScript. So this is probably a simpler change than one thinks.

Metaphorically, it is kinda like an offscreen CANVAS tag. Your hacked MAME merrily runs to it, and Emulatity just reprocesses these as high-speed bitmaps to be displayed to the actual visible CANVAs. The algorithms I am now able to come up with, now all fit in the GPU horsepower of even a 5-year-old Intel GPU -- even a mere smartwatch GPU now has enough power for some of the algorithms, amazingly enough;

This makes it easier to add a middleman renderer to Emularity since it only requires unidirectional data flow -- no need for upstream communications (telling emulator to mute or whatever).

It might not be high priority now but this should at least be masterplanned this decade, as people buy 4K displays -- higher resolution displays -- or higher Hz displays -- which can benefit from moving some display processing to the centralized side.

Why filter processing should be more centralized long-term:

All web browsers (even iPad and newer Android devices) are successfully able to do real time interpolation in my ultralowlag interpolation algorithm (only 1 refresh cycle latency) that I use at www.testufo.com/vrr -- provided the browser is GPU accelerated. The same algorithm I invented, can be used for fixed-framerate to fixed-framerate conversion -- aka standards conversion -- smooth 50fps PAL on 60Hz NTSC -- and smooth 60fps NTSC on 50Hz PAL. Or 100Hz or 120Hz or whatever.

Bitmap Mathematics Now Available In Single-Line JavaScript

All desktop browsers now has the HTML5 extensions necessary, and these capabilities are now generally autodetectable (bitmap mathematics operations where two bitmaps are mathed together in one line of code involving addition, subtraction, AND's, OR's, XOR's, brightness, contrast, saturation, alpha, etc), and I've come up with algorithms that produces really usable mass-benefits (e.g. HLSL simulation without shaders) and niche-benefits (e.g. software BFI). By 2025, even bottom-barrel 10-year-old android devices can do it at 60fps with full-retina-resolution CRT filter emulation, and thus, is no excuse to exclude central-filter-processing from Emularity

Even my VRR emulation algorithm uses simple bitmap mathematics, no shaders, no per-pixel processing -- AFAIK, I think this is the most JavaScript-efficient method of destuttering to fit a fixed-Hz emulator into a random-Hz browser visitor;

Why let downstream emulator modules inefficiently do non-HTML5-optimized filter processing -- when doing it properly JavaScript-optimized in a centralized JavaScript-optimized filter processing layer?

Relevant API References

From just merely these two sets of APIs, I've been able to pull of miracles that formerly required GPU shaders.

For those performance naysayers -- currently, an RTX3080 is capable of approximately 20,000 fullscreen bitmap mathematics operations between 1440p framebuffers (in Windows Chrome, GPU acceleration enabled), and an 5-year-old Intel GPU is capable of about 500 to 800 fullscreen 1080p bitmap mathematics opreations in JavaScript -- sufficiently fast enough for basic CRT filters using prerendered masks (triad of monochrome masks) stored in bitmaps to composite realistic phosphormasks, via a final bitmap-addition operation of the R/G/B channels. It will not do curved filters, but it would accurately emulate rectangular CRTs and its phosphordots as well as MAME HLSL without requiring GPU shaders.

With 100% pure HTML5/JS, emulating the phosphor texture of a SONY Grand Wega accurately would be easy on a 4K display on most recent NVIDIA/AMD GPUs today on any current GPU accelerated web browser (and spectacularly accurate texture if it's an OLED HDTV like a LG 4K HDTV, though even the 48CX is bigger than a 34" Sony Grand WEGA). One needs to do approximately 4-to-7 bitmap math operations cascaded per 60Hz emulator frame (original resolution emulator frames), to get near MAME HLSL-quality full-screen-resolutoin framebuffer. In detecting insufficient performance, it can automatically downrez (1080p CRT filter) until frames goes back up to full emulator framerate, but the full browser capability of a visiting user should be automatically milked, especially when minimum common denominator means 90% of the time full resolution filters can be used.

Most likely, for accurate resolution-adaptive CRT filters, a one-time prerender of precomputed filter bitmaps is needed (~0.1ms to 100ms overhead, depending on system) upon every browser resize event (or zoom/unzoom of fullscreen), but other than that, it will largely autoadapt to the full resolution of the pixel dimensions of the emulator box that Emularity runs in. It's shocking how fast-performance the bitmap mathematics APIs are now in the modern GPU-accelerated browsers because they're just hardcoded-in-drivers shader wrappers in most GPU-accelerated browsers. Driver performance do vary tons (and some `opensource contributions to improving opensource drivers are kinda needed) but, fundamentally, it goes at nearly full memory speed -- and some GPUs can do almost a terabyte per second of bandwidth, which means tens of thousand fullscreen bitmap mathematic operations per second!

In GPU accelerated browsers, an offscreen canvas now behaves just as fast as a 3D texture and the common JavaScript bitmap math operations are now automatically run in GPU optimized parallization, at full GPU memory bandwidth speed (on-GPU, even for bitmap copy-to-copy and mathing).

Extrapolate this 5 years and you'll not be asking questions about tomorrow's smartphones and $200 Chromebook GPUs ability to do real MAME-quality HLSL CRT filters in JavaScript-optimized bitmap mathematics.

Granted, Linux and open source video drivers sometimes have low performance (if you clicked the TestUFO links on Linux, make sure you have GPU acceleration enabled for the animations to work correctly). That said, performance typically can be automatically be detected. And performance is rapidly catching up. It's just a matter of time. I am impressed by the work of the kwin-lowlatency team, by the way.

And shaders can still be accomodated for even more realism later on; since CANVAS layers can be passed to WebGL for shader processing too, if one desired. But one needn't worry about the complexity; the important thing is a slight rearchitecture of Emularity to optionally enable middleman processing.

Besides, self-monitoring of JavaScript performance can also automatically disable CRT filters that don't fit the performance budget of the particular emulator visitor; so we don't even need t worry about waiting for lowest-common-denominator capabilities.... In the first 100ms of Emularity operation, it'll already probably know how fast its own JavaScript and bitmap mathematics is, and autoadapt accordingly, or it can be enabled optionally on a user hotkey (if power user knows they're running a GPU accelerated browser). Brief tests (via TestUFO bitmap mathematic tests), shows even an iPad now has enough power in the mobile GPU for retina-resolution MAME HLSL in pure non-shader bitmap mathematics JavaScript, which is downright impressive; witnessing MAME HLSL in realtime in a web browser.

Architecturing Requirements

This item doesn't specify filtering capabilities, but the eventual ability to add filters. For this item to be closed;

Most filter processing only need current framebuffer, but some temporal filters (e.g. standards converters -- e.g. de-stuttering 50fps on 60Hz displays -- or some advanced BFI algorithms) may need one or two frame lookbehind.

The emulator doesn't need to be aware of any existence of a filtering mechanism, and just only need to pass along emulator framebuffers at regular emulator refresh cycle intervals (just like it is already doing today, already to a CANVAS) -- just that the CANVAS now becomes offscreen / solo bitmap context (it's the same object instancof/typeof!) and is copied for filter processing abilities. This is a unidirectional workflow for the emulator -- no need for Emularity to "communicate back" to the emulator, even during browser resize events.

For simple filters, minimum processing latency will be the time interval of 1 or 2 bitmap copy operations (1/500sec on Intel, to 1/20000sec on RTX 3080), and could perhaps be disabled (provide context to direct Canvas instead of the offscreen Canvas or Bitmap), so there is no mandatory copy performance overhead for the slowest browsers;

Mainstream benefits

Obviously, the more mainstream filters are more useful (better scaling for full screen mode) but the bottom line -- I think it's time to start mentally thinking of architecturing a filterprocess architecture for Emularity -- masterplanning this towards 2025 even if this is impractical right away.

People are visiting websites with ever-better screens (spatial / temporal / color resolution) with ever-better performing browsers with ever-better HTML5 capabilities --

I hope this presents a convincing argument to centralize filter processing / color adjustments / scaling algorithms that are fully HTML5-optimized etc.

Imagine, visiting Internet Archive, and seeing massively better quality emulators when resizing or clicking fullscreen button (more CRTlike scaling, rather than blurryscaling) -- and perhaps even destuttered. There are many benefits of Emularity-side filter processing.

mdrejhon commented 3 years ago

Updates, edited the above to add more information:

NOTE: @db48x -- If you have tried out a new Oculus Quest 2 standalone VR as I have, you'll quickly realize this is already practical -- Quest 2 is 1,000x better Holodeck immersion than Toy VR ViewMaster or Google Cardboard -- albiet encumbered by the Facebook requirement -- but it shows the potential of walking into an actual virtual reality arcade, written in standards-compliant web programming! And its in-VR browser already supports WebXR and is already powerful enough to run the Emularity emulators at Internet Archive already (via its built-in overclocked fan-cooled Qualcomm Snapdragon 865 GPU), although there's no keyboard to control them unless you pair an external Bluetooth keyboard. Also, with SideQuest sideload, I can force a 60Hz refresh rate (low persistence strobed -- CRT clarity mode), which means Sonic Hedgehog scrolls perfectly CRT-clear zero blur on the virtual floating screen in VR. Ideally, I'd like Emularity to someday add support for Oculus thumbstick joystick A/B/X/Y buttons via the WebXR APIs, though this could theoretically be outsourced to a different javascript wrapper that calls specific Emularity APIs for controller input -- is there already a speedrunner/roborunner Emularity API of any kind for accepting API calls to simulate controller input? If so, then one can write a 50-line-to-100 line JavaScript wrapper (utilizing some existing toolkit such as webvr.js) that doesn't need any modifications to Emularity main base itself. Then I'd be able to test-out playing those Internet Archive games at least on a floating browser window inside my Oculus Quest 2 headset, using just the Oculus GPU, without a computer, and controlling the already-able-to-run Emularity emulators using my Oculus controllers! Niche case, but it's so weird everything runs perfectly and at full speed (most of the emulators) in a floating browser window inside my VR headset, but unable to play them using the joysticks I'm holding in my hands!

Nontheless, Emularity is a pretty impressive piece of work and I realize large parts of it is not really actively being maintained at the moment because of difficulty porting non-JS emulators to JS -- but a theoretical uptick of maintainers could arrive, with the simple add of filter hookability and controller hookability -- a quick glance suggests this is a ~300-to-500 line change to Emularity (including one simple example filter for demonstration purposes).

For now just KISS and make Emularity capable of offscreen canvas, so it can be part of a filter chain to an onscreen canvas. It will unlock infinite possibilities in the era of improving browser capabilities/performance.

My quick look at the Emularity source code, particularly loader.js, is that no changes to the emulator modules are needed and the emulator framebuffer object is directly compatible with an offscreen canvas / context to Bitmap object (it's actually the same object type) -- the very same object capable of highspeed-JS-processing capable with the bitmap mathematics APIs.

There are three main workflows I can see potentially compatible with the Emularity workflow:

  1. Context to a hidden CANVAS element, perhaps a CSS hidden style. Emularity copies it to Bitmap every emulator refresh cycle, for reprocessing to a second CANVAS element that is visible.

  2. Context to a Bitmap object. Emularity saves the bitmap object every refresh cycle, and a new bitmap object used in its place for the next emulator refresh cycle to render to. This requires a little more rearchitecturing, but improves efficiency a bit.

  3. Context to an OffscreenCanvas object. Emularity will easily work with this because it's a drop-in replacement in loader.js so rearchitecture requirements become nearly zero;

For memory-constrained browsers, one can point to the onscreen context, bypassing the offscreen (the "filters disabled" mode). Either route works with any of the above.

Some niggly small amounts of refactoring appears to be needed, but nothing too killer seems to be needed.

Metaphorically, it makes logical sense, because we're simply fundamentally software-emulating a display, and a display input is fundamentally a unidirectional receive-only device; so it surmises that no upstream communications are naturally needed. Framing programmerthink this way; makes it much easier to conceptualize filter processing as a black box that the emulators don't need to know about (much like the processing in a display's motherboard).

Supporting filters is simply Emularity embracing its purpose: The ability to use a browser as the display output. We're just "enhancing" the emulator's display.

So I feel reasonably confident it is not a megaproject to at least "enable filter hooks" (even if you don't write the filters). Maybe no programmer resources can do it this year, but I think this should be done by 2022-2025 -- even as early as 2021.

mdrejhon commented 3 years ago

@db48x Do you know anybody who might take upon the basics of filter hooking?

I am willing to donate code for a highly optimized WebGL-free JavaScript CRT filter emulator (JavaScript-optimized bitmap mathematic algorithms) that creates roughly the very approximate look of a square Sony Grand WEGA television set in terms of phosphortexture, using purely bitmap masking and bitmap mathematic operations involving addition/subtraction/alphablending, with adaptive resolution-matching uprezzing/downrezzing on browser window resize.

The main important elements of a CRT filterlook, would be there; including the scanliney look with the fuzzy fade around circular electron-beam pixeldots, and the subpixels of red/green/blue phosphor dots -- just like real MAME HLSL filter, except it's simply ported to a bitmap-mathematics-operation workflow amenable to ultrahighspeed HTML5/JavaScript. It won't be as configurable as a MAME HLSL shader, but at least it's one exact specific MAME HLSL shader lookfeel visually ported to high speed HTML5/JavaScript algorithm that only use bitmap mathematic operations. For improved brightness, the phosphordots will be slightly enlarged/fuzzed and alphablend-overlapped onto adjacent phosphordots (RGB color combining). I might be able to have constant (that might be adjustable by a future configurable slider or increment/decement hotkey) that increases/decreases fuzz/focus as a brightness-versus-phosphordot-accuracy tradeoff (since the black pixels between CRT phosphordots will dim the emulation somewhat, necessitating a slight defocussing algorithm to brighten the CRT-filtered Emularity framebuffer. Nontheless, I've verified that there's already sufficient browser performance to pull this off; that's the important part, it's just a bunch of AND/OR masks combined with the "ADD" bitmap mathematic operations to combine 3 separate phosphordot framebuffers (to recombine R/G/B layers, to recombine the red phosphordot layer, the green phosphordot layer, and the blue phosphordot layer, onto one final full-screen-resolution bitmap).

Bitmap->Bitmap (2 offscreen canvas) copy operations, Bitmap Scale (2 offscreen canvas) operations, and Bitmap Math (2 offscreen canvas) operations, are run at full GPU shader speeds at literally the speed of a 3D quad, even when called from the 1-line JavaScript, and there's enough bandwidth budget (~500 to ~20,000 fullscreen-resolution bitmap operations per second) on most recent proper GPU-accelerated browsers.

(Phosphor fuzz will be caused by prescaling the small emulator framebuffer before the bitmap masking / mathematics; but scaling from one Canvas to a 2nd offscreen canvas/bitmap is also an instant-operation on GPU-accelerated browsers)

My algorithm essentially takes:

All of these framebuffers stay in GPU memory, allowing it to be processed at full speed in existing HTML5 APIs that can do bitmap copy/scale/math ops. Clearly, browser performance is already here. Performance will vary per system (CPU, GPU) but it's already practical.

(In processing/memory constrained situations, just copy between the offscreen canvas to the onscreen canvas element, or only do the sharp/fuzzy scale steps and skip the phosphordot/aperturegrille algorithms).

The size of my dummy filter test code is surprisingly compact, I predict that the biggest code bulk is simply a JasvaScript-based mask pregenerator (shadowmask/aperturegrille prerendered bitmap masks), which will need to run once per window resize (or emulator framebuffer resolution change, or fullscreen enter/exit) to properly rerender uprezzed/downrezzed versions of the CRT textures and then pre-insert alphablended scanline gaps into them aligned the pixel rows of the current emulator resolution.

High speed opimization of the mask-pregeneration step involves generating a small mask bitmap and tiling it along the whole large bitmap, since phosphordot/aperturegrille is beautifully redundant along both the X and Y dimensions, but I even make the phosphordotlook vary between vertically adjacent scanlines, and horizontally adjacent pixels! So the pixels-not-aligned-with-mask look is realistically preserved! The resolutionless behavior of a CRT is beautifully mimicked this way, thanks to the sheer overkill of a 1440p or 4K desktop screen (or retina iPad) emulating the resolutionless nature of a VGA CRT and its phosphor texture completely independently of emulator resolution.

That said, to keep mask bitmaps small and use less memory, the pattern can safely repeat after X scanlines -- but most won't notice that phosphordot-vs-actualpixel alignment pattern repetition approximately 10-to-20 CRT scanlines apart or thereabouts). Some advanced spinoffs of some of my TestUFO tests (some commissioned) already a form of elaborate mask-precomputing steps (for purposes other than CRT masking). So there's some mask-precompute precedent; and my internal tests show that mask-precomputes can be darn near instantaneous when optimized (literally 10-50ms league in non-shader precompute), hidden in the noise of an Emularity startup. Basically pixel-by-pixel precompute for a small bitmap, and then the precomputed mask bitmap tiled across a larger bitmap that is then used for one-line bitmap mathematic operation. All that can be done darn near instantaneously inside a window resize event, albiet sometimes more than one emulator refresh cycle penalty only for that instant of that browser resize event -- no penalty if browser size stays constant. And in future, mask precompute can be realtime when ported to WebGL in future (but I'm 100% avoiding WebGL at the moment).

Yes, when filter is enabled, it might temporarily lower emulator framerate during dynamic continuous window resizing, especially if mask precomputes takes more than 1 emulator Hz. But it's a one-time filter bitmap precompute step per resize event (of either original emulator resolution or output browser resolution). A sudden slowdown to 20fps during active browser resizing isn't the end of the world as it leaps back to full-framerate low-CPU-utilization after the resize, since the precomputed phosphordot/aperturegrille mask bitmaps are stored until next resize or browser closed.

I believe I should be able to volunteer this in my high speed JavaScript mask-bitmap pregenerator, as parameters/arguments

This could, in theory, be adjusted by buttons outside the emulator box in realtime while the emulator is running -- changing the parameters for the CRT filtermask precomputer function, though probably won't be included by Internet Archive implementations, but would allow developers who compile Emularity to test out various CRT filter masks, being massively far-beyond the quality/performance of current JavaScript-powered CRT-look emulators. And to decide what preconfigured CRT looks is desired (perhaps 4 hotkeys for Internet Archive for common square retro CRTs like a Mitsubishi TV, a NEC TV, a Sony Wega TV, a Sony PVM, as long as the screen is rectangular, since curvature is omitted for now).

P.S. No curved CRT initially. Although not part of the initial offer; Also in theory, a curved CRT could actually be emulated later simply by a step that warps the emulator framebuffer (and same amount of warp in precomputed mask framebuffers). There is APIs to distort a bitmap like that, or can be passed through a WebGL step (copy canvas to texture, then back), but no WebGL support is required if you don't need curved CRTs. And the curving only needs to run once per emulator refresh cycle, and is also GPU accelerate capable, so even curved tubelooks can also be emulated later in JavaScript in full realtime 4K native for browsers using current contemporary AMD/NVIDIA GPUs or sufficient performant Intel GPUs / recent iPads / recent Macs (and probably Linux with certain window managers like kwin-lowlatency and others, combined with the proprietary NVIDIA drivers, although not sure if Chrome/FireFox has same GPU performance as even ipad).

Also, there are mirror bitmap-math operations as texture-math operations (shader equivalents) so the algorithm is easily portable from pure HTML5 bitmap math to WebGL code later too, but I don't need any WebGL support to add non-curved CRT filter support.

My discovery that all the jigsaw puzzles of all the highspeed HTML5 APIs being properly extant to mimic a specific MAME HLSL filter at full resolution, and sufficiently performant on fast modern GPUs, now makes this pretty tempting.

If I can talk to that person who wishes to improve Emularity to add a filter hook, I'll be able to produce a TestUFO proof-of-concept of a simple CRT filter emulator using the above algorithm. For a realtime full-resolution browser-resize-adaptive CRT filter that implements the look of one specific good MAME HLSL filter (in a fixed manner). It should be capable of realtime on most GPU-accelerated web browsers running on AMD/NVIDIA powered PC/Macs as well as recent iOS/recent Galaxy (as well as Linux kwin-lowlatency desktop or one of those high performance desktops + fastest available NVIDIA/AMD Linux drivers).

Once that proof of concept is done, the person who adds filterhook support can continue modifying Emularity, and then move my JavaScript CRT filter code into Emularity. I'm surprsied filterhooks aren't yet possible (otherwise I would have never posted the BFI suggestion).

Initially, I propose this can be a keypress-activation-only feature (for simplicity). Disabled by default, but can be activated if you know you have a fast-enough browser for the advanced CRT filterlooks. Three or four hotkeys for three or four different CRT looks.