ata4 / angrylion-rdp-plus

A low-level N64 video emulation plugin, based on the pixel-perfect angrylion RDP plugin with some improvements.
155 stars 25 forks source link

Expose busyloop option for the Project64 API. #140

Closed ghost closed 2 years ago

ghost commented 2 years ago

There doesn't seem to be a noticeable performance difference between multithreaded rendering and single threaded rendering when using the "Slow, few glitches" option, the only difference I could find is that the CPU usage is about 8x times higher.

Meaning whether or not I have multithreaded rendering on or off, certain screens where emulation speed dips below 100% will have the exact same performance.

Now I don't know if this behaviour is intended but using this option makes multi-threading essentially redundant.

CPU Specifications Ryzen 7 5800X 8 Cores 16 Threads 3.8GHz Base Clock 4.7GHz Boost Clock

Jj0YzL5nvJ commented 2 years ago

Well, that description certainly leaves room for misunderstanding. It's useless to synchronize a thread with itself. Fast is async threads (DpCompat = 0) and Slow means "sync threads" (DpCompat = 2).

I think the most important part to achieve true synchronization on AMD processors is not exposed in the plugin for Project64, it is only exposed in mupen64plus =P (BusyLoop)

https://imgur.com/a/LO46NgT (Watch them in order, the sort is messed)

In any case it's better never to use all available threads.

First of all there are many myths and misconceptions related to multithreading in general. To that we must add the misunderstandings of how such techniques benefit videogames. All this together with the ignorance of some end-users with the firm belief that "the more the better" and/or "the newer the better". When in reality in many cases "less is more and better".

What am I referring to with all this?

Multithreading needs to be adjusted in some cases, that's why it keeps getting better. Some CPUs can benefit more than others. Too many multithreading could potentially impair performance in some games by spending more time syncing threads. Some newer Intel processors don't have hyperthreading and other older processors work better without it... etc.

ghost commented 2 years ago

Interesting, is there a reason BusyLoop is not exposed on the Project64 API side?

Edit: I managed to expose busy loop for the Project64 API side and have been messing around with the settings and it does actually seem like "more threads -> worse performance", it does make sense, having to wait for 16 threads to sync and all. Busy loop seems to "solve" that at the expense of having massive power draw and CPU usage. Again, makes sense. Now all one has to do is simply go lower with the thread count and find out which number of threads it still manages to run at full speed and voila.

Jj0YzL5nvJ commented 2 years ago

Interesting, is there a reason BusyLoop is not exposed on the Project64 API side?

The author who originally exposed BusyLoop in mupen did so just to improve performance on Android. It was a lucky accident, that this also fixed mupen sync on AMD processors. #115 #105