YukihoAA / waifu2x_snowshell

Multilingual waifu2x GUI shell for windows x64
MIT License
604 stars 34 forks source link

Use both GPU and CPU when batch converting? #9

Closed NintendoManiac64 closed 4 years ago

NintendoManiac64 commented 5 years ago

The Intel integrated graphics on my lowly dual core desktop Intel Haswell processor takes around 45 seconds to apply 2x-scaling and low noise reduction to a 960x544 image.

Interestingly though, setting snowshell to use my CPU results in the exact same process also taking around 45 seconds (albiet with my CPU overclocked to 4.6GHz), and task manager confirms that it's very much using my CPU instead.

I then got the idea to launch two instances of snowshell, one window set to use the GPU and the other set to use the CPU. Sure enough, it was able to then complete two 960x544 images with the same 2x scaling + low noise reduction in only around 50 seconds or so which is nearly half of the time it would take if snowshell tried processing the same two images only through the GPU or only through the CPU.

So wouldn't it make sense if snowshell had a built-in option to use both the CPU and the GPU when specifically converting multiple files? Now obviously it can't use the GPU and CPU at the same time for a single image, but this way would at least let it process two files at once rather than a single file at a time.

Also ideally it shouldn't just split the allocation evenly (that is, half goes to the GPU and half goes to the CPU) since that would be no good when using a GPU that is vastly faster than your CPU or vice-versa. Therefore it'd probably be best to just treat it as one single queue of files to convert for both the CPU and GPU and simply allocates the next image in the queue to whichever processor finishes first and is ready for another image.

...I did however notice that snowshell does give ever so slightly different results depending on whether the image is processed through the CPU or GPU, but this may only be a difference mathematically and not to a level that any human could actually notice since I discovered this difference via Paint.NET's "difference" layer properties setting rather than by actually looking at the results and the difference may be impossible to actually spot visually anyway.

YukihoAA commented 5 years ago

Using CPU and GPU in same set of batch processing is usually takes more time than using only GPU. It is only useful on non GPU device. There is no way to expect converting time so it cannot divide converting set in efficient. As you said, you can just use open two snowshell and converting manually if you want to convert but you should know that it looks faster but actually GPU mode also need CPU and RAM resource so GPU conversion speed will be decrease due to CPU mode conversion. It could be faster if CPU mode speed is similar but usually not.

NintendoManiac64 commented 5 years ago

It could be faster if CPU mode speed is similar but usually not.

I reckon it could be a case of that the iGPU on my Pentium is so woeful that the amount of CPU it actually needs is minimal.

I mean, the time it took was around 50 seconds rather than the 45 it took when doing things separately, so this does imply that GPU does use the CPU at least a little bit, but it still very much was a major speed-up.

I do have a 4c/8t Nehalem Xeon X3470 paired with a Radeon HD5870 (manually flashed with an HD5850 BIOS) that would likely be a better test case of a higher-end system, but unfortunately its installation of C++ 2015 and newer is completely borked and nothing I've ever done has been able to fix it, so I'm unable to run snowshell on said PC (and I didn't want to reinstall the OS since said PC is likely to either be retired and/or swapped over to Linux at some point in the semi-near future).

As you said, you can just use open two snowshell and converting manually if you want to convert

As for there being no way to expect converting time, this isn't really what I was getting at.

I was basically thinking of how foobar2000's batch conversion works on a dual core CPU where it simply starts with the first two files (one file per core) and then, as soon as one of those two are finished, said vacant core is then given whatever the next unconverted file is for processing, and then as soon as either core finishes their task (even if it's the same one that finished the first file), then it's assigned the third file and so on.

FXWood350 commented 5 years ago

For the average user, running waifu2x on GPU massively outpaces the speed of running it on a CPU. My CPU (an R5 2600- a relatively powerful CPU) converted a particular file in about 1 hour, albeit on Caffe - converting the same file at a higher scale on my GPU (V64) takes mere seconds. For the average user, it is highly likely that the GPU will have finished all its conversions by the time the CPU finishes one image- it's just a fact that neural nets are innately more parallel, and so will run much faster on a GPU than a CPU. The average midrange GPU has 640 CUs- compared to a CPUs average of 4.

NintendoManiac64 commented 5 years ago

My CPU (an R5 2600- a relatively powerful CPU) converted a particular file in about 1 hour, albeit on Caffe

Bloody nora, just how large of an image was this? As I said, I was converting 960x544 images to 1920x1088 in only 45 seconds on a mere 2core/2thread desktop Haswell (albeit overclocked to 4.6GHz).

...unless maybe Caffe hits major diminishing returns in terms of multi-threaded capability? I mean, your Zen+ does have single-threaded performance that would be less than my Haswell (though certainly not by much!)

The average midrange GPU has 640 CUs- compared to a CPUs average of 4.

You can't compare shaders to CPU cores as each individual CPU core is vastly more powerful than a single GPU shader. If you had a CPU with just as many cores, then it could be able to run considerably faster than a GPU with the same amount of shaders.

NintendoManiac64 commented 5 years ago

I'd just like to mention that a user on the waifu2x-converter-cpp github issue tracker suggested the same thing, and they state that their 8-core CPU (exact CPU model unknown) apparently is able to run at about half the speed of their GPU (exact model also unknown) allowing for a 50% faster conversion time (the user mathed wrong and stated 33% faster).

I've asked for clarification on what exact CPU or GPU model they have, but we can at least rule out it being a mainstream Intel 8core chip since those launched a week later after the user posted, meaning it's likely either a Zen1/+ based Ryzen or one of Intel's HEDT chips meaning it's likely to be an 8core/16thread part as well.

NintendoManiac64 commented 4 years ago

One idea I just had regarding what to do when using both GPU and CPU and one processor is substantially slower is that, once the faster processor has finished all other images and the slower processor is still trying to finish, instead have the faster processor also start converting the same image that the slower processor is currently trying to finish.

Then once either processor finishes, simply automatically terminate and discard the conversion that was still going on the remaining processor.

So for example, if your GPU is 5x faster than your CPU and it takes 1 minute to convert single image on your GPU but 5 minutes to convert a single image on your CPU, then even if the CPU is half-way done with the final image and the GPU finishes all of its allocated images, it would still be faster if the GPU then starts converting the same image that the CPU was in the middle of converting and then, once the GPU is done, the CPU's conversion process is canceled.

This method can also be quite useful for when only converting maybe two or three images where one of your processors can blast through the entire batch before a your other processor has a chance to even finish one image.

Another key point of this method is that it does not discriminate whether the CPU or the GPU is the faster processor which can be key in more server and/or offline video rendering style PCs that tend to have highly multi-core CPUs that are sometimes not even paired with a discrete GPU if not just a low-powered one. Also there's that upcoming 64core Threadripper 3990X (as if the current 32core 3970X wasn't enough of a beast).

YukihoAA commented 4 years ago

Sorry, Snowshell cannot control it. It is technical area, Since waifu2x-snowshell is not stand-alone program so this program cannot know PC has CPU only or also GPU. It is main problem.

You keep saying foobar2000 as example, foobar2000 is "standalone" program that actually waifu2x is additional plugin which is plugin make fit to foobar. so foobar can just give api and maker should fit standard.

Snowshell is not like that, snowshell need to obey rule of each converter which is various processor control option so only snowshell can is force CPU or not (actually if you choose GPU it only works as default option of each converter. Snowshell does not know GPU is exist or not, so cannot control in case with intel iGPU because I know sometimes converter select default option to CPU instead of iGPU)

I wont implement this because controling with GPU/CPU is not a thing thing that shell do. shell only supports command control.

If you want pararel processing ask each converters programmer to support it.

You should know it is foobar's plugin's multiple GPU supports also controled by "plugin" not foobar. which means in Snowshell case, It should supported by "each converter" not "Snowshell"