bsnes-emu / bsnes

bsnes is a Super Nintendo (SNES) emulator focused on performance, features, and ease of use.
Other
1.68k stars 158 forks source link

bsnes-hd upstream and development #16

Open DerKoun opened 4 years ago

DerKoun commented 4 years ago

Hi, This is Derkoun, of bsnes-hd. I just wanted to start this issue as a place to discuss anything you want to upstream from bsnes-hd, and also further development of HD Mode 7 and other HD features in bsnes and/or bsnes-hd, see https://github.com/bsnes-emu/bsnes/issues/12#issuecomment-614275965

Screwtapello commented 3 years ago

Thanks for starting this conversation, and sorry it's taken so long to respond.

I confess I'm not super-familiar with all the features bsnes-hd currently offers, but some people in the official bsnes/higan discord have been posting screenshots of the smoothed gradients and window effects that look sweet.

As a community project, bsnes is probably not going to see much active development in the near future (definitely not as active as byuu was), but if the smoothed-gradients and high-res window effects aren't too invasive a change, and you (or anyone else) had the time to port them across, they seem like they'd be great additions.

In general, I'd like to maintain bsnes with a similar philosophy to the one byuu followed, so if there was anything in bsnes-hd he just wasn't interested in, that's probably still the case... but it's still worth asking about.

DerKoun commented 3 years ago

I have literarily tried for month to get back to working on bsnes-hd and still haven't found the focus and time to actually get things done. So please don't apologize.

High-res windowing is experimental. It still looks bad in many situations. And improving it is not a priority for me. It's also unoptimized. In generell I'm sure someone with more knowledge about those things could easily improve the 'smart' averaging of the coordinates significantly in a very short time. I pretty much take the naive approach and then pile on tweaks that help some and break some. Still as a PoC it allows for some pretty nice screens and videos. So I won't give up on it.

Smoothing the background colors required a fundamental change: I moved the conversion from 15bit colors to 24bit colors from "as late as possible" to "as early as possible". So the colors stored per Pixel object are now true-color. I assumed that such a deep change would not make it upstream.

Speaking of structural changes, I plan to restructure the pixel arrays into parallel arrays for color, layer and priority. So copying the colors can be optimized or even avoided. Am I right to assume that there is no interest in merging that upstream when its done? Would you be interested measurements of the results of the optimizations?

So, sorry that none of the features are easy to upstream. Maybe we can chat about the optimizations at some point.

Screwtapello commented 3 years ago

If you're not happy/comfortable with high-res windowing in its current state, then yeah, pushing that upstream is probably a bad idea. When you say it "looks bad", how bad are we talking? Without looking into it myself, I'd expect that just interpolating from the values on line X to the values on line X+1 wouldn't look to bad - sure, it might get a bit clunky on the tops and bottoms of circles, but even well-regarded systems like xBR have situations where they look great and others where they make a bit of a mess. It's just a matter of letting people turn those enhancements off for games where it's not a net improvement.

You're right about not being keen to rewrite the whole graphics pipeline from 15 to 24bit right off the bat, but I do wonder if there might still be benefit from interpolating in a 15bit mode — if the gradient changes slower than one tick per scanline:

...
0x03
0x03
0x04
0x04
...

then yes, you need higher-than-15bit colour to do interpolation, but if the gradient changes faster than one tick per scanline:

...
0x03
0x05
0x09
0x0D
...

...then interpolating even in 15bit could be beneficial. I don't know if any SNES games do draw gradients like that (copper bars?) but if they do, it might be worth it.

Like the 24bit colour thing, I'm not too eager to restructure the graphics pipeline from array-of-structs to struct-of-arrays... but if it provides a significant speed increase, it may be worth it. I'd be very interested to see any performance profiling results — even if you find no significant speed increase, or even a decrease, that's useful to know.

DerKoun commented 3 years ago

Visually the main issues with high-res windowing are the tops and bottom of shapes and currently also very flat (non-steep) diagonal lines or curves, which both are not smooth (the latter is from a fix for horizontal edges). But my concern is more about the code. Beside it generally being written by a C++ novice, it's also not designed too well. It's just simple averaging, which then got more and more conditions piled on it. I'm sure there are existing algorithms that can handle this better and faster, but it's out of my area of expertise. Maybe instead of 'experimental' the term 'prototype' would fit better. Its a proof of concept that should be rewritten with a better concept. Also I worry about performance because, among other things, the boolean-array, that disables pixels on the main- and sub-screens according to windowing, had to be changed from native pixels to HD pixels, i.e. its size increased from 256 to 256*scaleFactor*scaleFactor.

Ironically I plan to add code to prevent averaging of background colors between lines that have too different colors, as I have at least one report of graphical issues that this should fix (Super Metroid IIRC). Generally I doubt that averaging with 15bit colors will be a relevant improvement in many games. (I won't they no game at all will benefit, but In my estimate not enough to justify the additional code). One way to add it without global changes could be using some form of per-line dithering to approximate the higher color depth. Not sure that is practical, just an idea.

Once I get to the restructuring I'll benchmark any usable state of the code. I'll report the intermediate results to any issue-tracker and discord that wants them. I hope that in turn people with more C++ and opimization experience will point out any obvious mistakes I make (It's pretty much the first time I'm working on C++ code specifically to optimize it for performance).

Screwtapello commented 3 years ago

With the way windowing works, I don't think you're ever going to have a nice solution for near-horizontal lines. If you just did the most basic interopolation, with no extra fix-ups, how terrible would that look? How invasive would the code-changes be? If there's a simple implementation, it's easier for other people to come along and suggest refinements.

My high-level understanding of this code is that the fast PPU takes a snapshot of the PPU state at each scanline, and then the scanlines are rendered in parallel, each using their own copy of the PPU state. I feel like there's a golden opportunity to upload all that PPU state to the GPU and let it do the rendering at whatever output resolution it likes, but that would probably require a radical replumbing of how (and what kind of) data is transferred from the emulation core to the GUI. Perhaps as an initial step, you could add a keyboard shortcut to dump all the PPU state for a given frame out to a file, or a bunch of files, and then write a standalone program in some more convenient language to read that data and render it on the GPU? Again, as a prototype and proof-of-concept, not as a permanent solution. If nothing else, it seems like the sort of thing RetroArch would pour resources into once it was proven to be possible.