Russian-Doom / russian-doom

A limit-removing source port of Doom, Heretic and Hexen. It has numerous vanilla bug fixes, enhanced 640x400 and 1280x800 rendering resolutions, improved game palettes and offers many optional aesthetic game enhancements along with the maximum possible translation to the Russian language.
GNU General Public License v2.0
81 stars 13 forks source link

[Request] Option to disable intermediate buffer for 2x performance on Raspberry Pi #158

Closed vanfanel closed 4 years ago

vanfanel commented 4 years ago

Hello there, @JNechaevsky

I have just built Russian Doom on the Raspberry Pi 3 for the first time. I read about it when you posted in my Heretic crispness question on the CrispyDoom gitub issues section, and I find its a great sourceport, with uncapped framerates on Heretic!

Sadly, Russian Doom lacks a simple option that could easily make the game run at rock-solid 60FPS on the Pi as Crispy does.

In CrispyDoom, disabling "smooth pixel scaling" in the Crispness menu, the framebuffer is directly loaded into the texture that gets scaled to screen without any intermediate step. That is great, because that way no intermediate buffer is used, so game goes from 30FPS to 60FPS. Could such an option be added to RussianDoom too, please?

I have tried setting #define hires 0 in src/i_video.h, but intermediate buffer is still used that way. We need to disable the intermediate buffer copy, too.

I have also spotted the "Smoot pixel scaling" in russian-setup (which seems similar to "smooth pixel scaling" option in Crispy), but disabling that option still makes the engine use the intermediate buffer copy.

JNechaevsky commented 4 years ago

Hi @vanfanel!

But it's done already right in the game, "Rendering" menu (and smoothing is always disabled by default): image

So, even with "sharp" option you still don't have a 60 fps? That's odd, because there are exactly three places where scaling quality is set, and it's always nearest:

1) https://github.com/JNechaevsky/russian-doom/blob/532e6dc20f1fdac5de1ca229cf36370070e6f163/src/i_video.c#L641 2) https://github.com/JNechaevsky/russian-doom/blob/532e6dc20f1fdac5de1ca229cf36370070e6f163/src/i_video.c#L1302 3) https://github.com/JNechaevsky/russian-doom/blob/532e6dc20f1fdac5de1ca229cf36370070e6f163/src/i_video.c#L1495

vanfanel commented 4 years ago

@JNechaevsky Thanks for your response :) Its not about nearest vs linear. In fact, in SDL2 using the GLES2 renderer of the Raspberry Pi, linear is better looking and its by no means slower.

Its about the intermediate buffer used to upscale, look at this explanation:

https://github.com/JNechaevsky/russian-doom/blob/532e6dc20f1fdac5de1ca229cf36370070e6f163/src/i_video.c#L66

As you can see, there is "texture" and "texture_upscaled". There should be a way to directly draw "texture" to screen in NO-hires mode, so the original 320x200 "texture" gets directly rendered into the screen without using "texture_upscaled".

This is exactly when the un-needed double-copy is done: https://github.com/JNechaevsky/russian-doom/blob/532e6dc20f1fdac5de1ca229cf36370070e6f163/src/i_video.c#L758

What I am trying to say is that "texture" should be directly copied to the renderer, NOT using "upscaled_texture" at all.

vanfanel commented 4 years ago

@JNechaevsky I have alreay archieved rock-solid 60FPS here on the Pi, following my own theory. This is how I have left your rendering code to archieve that:

   // Make sure the pillarboxes are kept clear each frame.

    SDL_RenderClear(renderer);

    // Render the original texture to screen.

    SDL_SetRenderTarget(renderer, texture);
    SDL_RenderCopy(renderer, texture, NULL, NULL);

    SDL_SetRenderTarget(renderer, NULL);

    // Draw!

    SDL_RenderPresent(renderer);

Up to this point, the code is the same. As you can see, "texture_upscaled" is not used anymore. Seeing 60FPS now. What I am asking is for an option for this without having to mutilate the code each time I build RussianDoom :D

JNechaevsky commented 4 years ago

That was quick. :)

To be honest, all this stuff is too complicated for me, I'm pretty good with art/drawing, but not with code itself. The code you have provided is working fine at my side (desktop PC, Windows), except pixel smoothing.

So, to make everyone happy, have a working smoothing and possible few bits of increased performance w/o smoothing, something like this should be done, right?

    // Render this intermediate texture into the upscaled texture
    // using "nearest" integer scaling.

    SDL_SetRenderTarget(renderer, smoothing ? texture_upscaled : texture);
    SDL_RenderCopy(renderer, texture, NULL, NULL);

    // Finally, render this upscaled texture to screen using linear scaling.

    if (smoothing)
    {
        SDL_SetRenderTarget(renderer, NULL);
        SDL_RenderCopy(renderer, texture_upscaled, NULL, NULL);
    }

    // Draw!

    SDL_RenderPresent(renderer);
fabiangreffrath commented 4 years ago

@JNechaevsky I have a switch for this in Crispy that affects these exact lines of code.

JNechaevsky commented 4 years ago

@fabiangreffrath, :open_mouth: oh, I see, thank you!

JNechaevsky commented 4 years ago

@vanfanel all done in https://github.com/JNechaevsky/russian-doom/commit/0a5184f83e8b704f0402bfb7b122e203f336afaa ! Still wondering about possible performance increasing.. Guess I need not-very-powerful PC for some benchamark test, a tablet PC should be suitable for some -timedemo runs.

vanfanel commented 4 years ago

@JNechaevsky Great! I just tried last commit and indeed it does run at 60FPS now on the Raspberry Pi 3! :) A Pi is a good testing platform, and its cheap, and you can run GNU/Linux on it with no Xorg server at all (SDL2 runs accelerated directly on KMSDRM, how cool is that?), so I may recommend that as a testing platform.

Another problem with speed, as I told @fabiangreffrath , is that ugly SDL_BuildAudioCVT call in src/i_sdlsound.c. This SDL_BuildAudioCVT should only be called once, and the SDL_AudioCVT pointer it receives as its first parameter should be re-used forever. Calling SDL_AudioCVT each time a sound is played is just plain wrong because it is very slow.

Thankfully, we have the non-SDL_Mixer resampling code just under that, so I did change https://github.com/JNechaevsky/russian-doom/blob/a8c91e230a1cf7d4f5bac8dee9b329aff8ccb47d/src/i_sdlsound.c#L642 to if (false) and then I can avoid these ugly slowdowns when a sound is played sometimes.

fabiangreffrath commented 4 years ago

You wanted to bring this up upstream at the Choco bug tracker, didn't you? 😉

JNechaevsky commented 4 years ago

Just checked replacement of whole mentioned if (samplerate <= mixer_freq ... condition - everything seems to be fine, so I'm all okay go get it replaced. :sparkles:

It should be this, yes?

    // If we can, use the standard / optimized SDL conversion routines.

->  if (false)
    {
        convertor.len = length;

@vanfanel, it's not a problem to purchase a Raspberry, the problem is - I'm not a power user of Linux, so barely will be able to use all it's features.

Also, violating a tablet PC may be not very good idea. I've totally forgot that right in front of my eyes is lying a net-top with pretty weak CPU (AMD G-T56N @ 1.65 GHz). AFAIK, it's a interesting CPU , designed to play video streams/files, as well as the nettop itself is primary created to be used for TV. I'm using as a file archive.

@vanfanel, @fabiangreffrath, and while we are speaking about the sound - I have remembered one interesting...bug. I'll explain. Few years ago, while Choco/RD was using SDL1 libraries, I've got a basic bug report about "there is always a small chance, that for one tic sound may play with full volume". Yep, I was also having this, but it happens really rare and mostly right before exiting the game. Was absolutely sure it was a SDL1 specific, but no, even with SDL2 it may happens.

Some time later @jmtd asked me to catch this on video record, but I never done that. And even if I do, it will be simply non informative - will not reveal a reason.

More later, I was surfing on DoomWiki and found an interesting comment by @nukeykt: https://doomwiki.org/wiki/Talk:Sounds_changing_pitch_on_slow_computers

Could it be related? Do you happen to have it?

I really have no idea, is this one-tic-loud bug happening on non-Windows systems, but can swear it still happens in Chocolate, Crispy and RusDoom nowadays on Windows.

JNechaevsky commented 4 years ago

p.s. @vanfanel you may wish to check out Heretic, now it have a language hot-swapping switch, as well as visually improved menu and some new rendering features, same to Doom. Just strike this item (screenshot) in options menu once and you're all set.

I haven't released it yet, need to double check everything, but it's pretty much in release-ready condition. And here's why I'm afraid to PR it to Crispy: despite of menu is looking pretty good, under the hood it's a bit messy and not optimized much. @fabiangreffrath will do something awful to me, if I'll try to make a PR with current menu implementation (and will be right!). :frowning:

fabiangreffrath commented 4 years ago

Erm, no I won't. I haven't even seen the code yet, so why do you believe I won't like it? People ask for Heretic all the time and any contribution to this game will be highly welcome. If there are style or logic issues, we will overcome them. Remember how well your latest PR for Choco went?

JNechaevsky commented 4 years ago

@fabiangreffrath have a look at this please. It's an updated Display option menu: 123

Is it looking good enough? Yep, it is. A bit shifted left, but I prefer to have same X-position for both languages, and Russian strings are longer. So what's wrong with it? Let's see:

  1. First of all, golden titles are not present in standard menu system, they are placed separatelly: https://github.com/JNechaevsky/russian-doom/blob/master/src/heretic/mn_menu.c#L513

  2. Unlike common Heretic menu, this sub menu using a small font and smaller vertical spacing: https://github.com/JNechaevsky/russian-doom/blob/master/src/heretic/mn_menu.c#L1319

  3. And small arrow: https://github.com/JNechaevsky/russian-doom/blob/master/src/heretic/mn_menu.c#L1374

  4. And custom slider, because standard one will not fit well, it's too high. https://github.com/JNechaevsky/russian-doom/blob/master/src/heretic/mn_menu.c#L4031

  5. Finally, "on/off" and sliders for features was made in similar way to Raven code - their offsets are hardcoded: https://github.com/JNechaevsky/russian-doom/blob/master/src/heretic/mn_menu.c#L1789 To prevent some string being messed up, I have replicated both standard English fonts and made them unchangable for options menu.

For the end user this menu will look nice, and it's working fine, though. Yeah, making a custom menu for Heretic is even easier than it was in Doom, but the code style is a question num 1 in Crispy.

JNechaevsky commented 4 years ago

@fabiangreffrath thinking more, it's not a big problem. If you could create a patch which will bring some initials for new menu/items, a basis which I should use instead of devastating current code, this will be great.

But we still don't have any optional Crispy features in Heretic, so filling menu is still under the question.

fabiangreffrath commented 4 years ago

But we still don't have any optional Crispy features in Heretic, so filling menu is still under the question.

Actually, this is a egg-chicken problem. We won't have more features to choose from as long as we don't have a menu to enabled them. Regarding your menu design, I'd be glad if we had a Crispness menu in Heretic like this!

Edit: It's called "egg-chicken", I believe...

JNechaevsky commented 4 years ago

I've got it. :) We can do one part because of lack of second part, and second part can't be done without first one.

So, before I've start making a first step, we need to clarify:

Something else?

JNechaevsky commented 4 years ago

Uh-oh. Could you please remind, what should I do to build a Heretic executable under MSYS?

fabiangreffrath commented 4 years ago

make crispy-heretic.exe

drfrag666 commented 4 years ago

@vanfanel How many fps you got in lowres before adding this? Did low detail make any difference?

drfrag666 commented 4 years ago

I've added this too as i deduced lowres didn't ran well either. About the SDL_AudioCVT thing it's still not in the Choco tracker.

JNechaevsky commented 4 years ago

@drfrag666 if you are not using uncapped mode and locked to vanilla's 35 fps, what do you expect ho have? :)

drfrag666 commented 4 years ago

Heh, didn't think into it but i expected it could be even too slow to get the 35 fps.

JNechaevsky commented 4 years ago

But you still can do benchmark test. Just try to run this: chocolate-doom.exe -timedemo demo1 for about three times to see an average fps before adding @vanfanel 's correction and three times after.

You may also experiment with different demos, but always remember to keep same window mode/size while tests.

vanfanel commented 4 years ago

@drfrag666 Always in uncapped mode (35FPS makes no sense at all in modern displays) I was getting 30FPS with Vsync ON (always ON on my driver on the Pi, thats OK because no vsync is ugly as sin). After the modification, I get exactly as many FPS as my VSYNC rate allows, which means around 60FPS, as intended :) Chocolate Doom needs uncapped framerate, really.

@fabiangreffrath please merge the code for uncapped framerate from @JNechaevsky RussianDoom in crispy-heretic, too, if you can. It works fantastic!

@JNechaevsky I always build the latest version! So I am already enjoying the new menus and details!

vanfanel commented 4 years ago

@JNechaevsky Instead of this (which works but it is an ugly patch I did without thinking much): https://github.com/JNechaevsky/russian-doom/commit/5ca811bbcc0429f45dada633dec93814e7693686 you should go for proper sound fx caching during startup like this: https://github.com/chocolate-doom/chocolate-doom/pull/1246

drfrag666 commented 4 years ago

35FPS makes no sense at all in modern displays

So it didn't cut it. I could argue that playing Doom on modern displays makes no sense, 35 fps are fine for me. Playing on a 386 with 12 fps was another thing. But really on most LCD screens the game looks like shit and no amount of correction will really help being ingame or driver adjustments, a custom palette like Julia's or shaders since the screen cannot display black not even get remotely close. Everything looks foggy, grayish and colors are dim. Color representation is specially poor on cheap displays, mine being more blueish than smurf's village so i must use a correction app. I used a CRT and i still own it but i barely use that computer now after the accident.

vanfanel commented 4 years ago

@drfrag666 Yes, but its impossible for modern displays over HDMI to go 70Hz, and even if they could, why should we switch phisical resolution at all for a game? Just let the game run at an speed dictated by the physical mode in use. That is why uncapped framerate does Crispy and Russian Doom sourceports move/animate so well. I would love Chocolate having this option too. Its an option, after all, and the results of that option makes Doom movement look better than 35FPS on a typical HDMI 60Hz mode. In other words, uncapped framerate makes Doom look more like it was intended to look than 35FPS on a 60Hz display, so please consider it as an option.

JNechaevsky commented 4 years ago

@vanfanel, oh, I'll update the code in the evening, thanks again!

@drfrag666, yeah, standard palette is a most painful thing nowadays. I still remember well how Doom was looking in 1994 on 14" CRT monitor, it was having a dark and juicy colors. I was playing around with various approaches of palette darkening in #5, but even after consolidating improved and standard palettes into one set, improved palette is still a different palette, which is not really modding-friendly.

If I'm not mistaking, @bradharding have implemented much smarter approach by darkening palette via code, making improved colors friendly with any kinds of different PLAYPAL palettes.

drfrag666 commented 4 years ago

That's not how it works, DOS ran @72 Hz. LinuxDoom supported unlimited framerate but it's capped to 35 Hz becouse the playsim runs at that frequency. Crispy has truly uncapped framerate yes but also limited to 70 and 60 to avoid stressing the hardware. I ported variable framerate for the isa vga simulation so RUDE can run even slower. But i'd need interpolation for things to be smooth and that's beyond the scope of the project (but widescreen isn't). Chocolate aims to emulate DOS Doom so no way.

drfrag666 commented 4 years ago

Brad changed the gamma tables, they are generated in the code now. But one thing you could do is edit the first table, not that i see a difference tough. https://github.com/drfrag666/chocolate-doom/commit/cf897ef70cf7196f57ce29eb12eda9b1fd51fc4f I added your palette as a display option, i think i already told you. There are VA panels but they are not used on laptops.

drfrag666 commented 4 years ago

This SDL_BuildAudioCVT should only be called once, and the SDL_AudioCVT pointer it receives as its first parameter should be re-used forever. Calling SDL_AudioCVT each time a sound is played is just plain wrong because it is very slow.

That's not true, not all sounds have the same samplerate, may be it's slow on the Pi but not on a PC. About that caching i'm not sure how it would affect the disk icon, certainly it was not in the DOS version and Choco aims to emulate that.

JNechaevsky commented 4 years ago

I think we done here. Thanks colleagues!