Open Jason-Bloomer opened 2 years ago
This seems to happen on some AMD GPUs when you get an out of memory error. The way auto tiling works is by detecting these out of memory errors, tiling the image, and trying again. However, on these GPUs, going out of memory crashes the graphics driver and therefore any subsequent attempt by chaiNNer to upscale results in an error.
I'm not sure how to resolve this for auto tiling, but you can for sure fix it by selecting a larger number of tiles (so that each tile would be no more than 512x)
I do not have an AMD GPU, I am running an Nvidia RTX 3060. If I select any option other than Auto, the program also errors out on those same images, with the same error indeed. I've tried every available value and none of them seem to do anything differently. All the same images are not processed.
EDIT: As a side note- I have a pre-silicon-nerf 3060 with 12GB VRAM, not the 8GB the newer Ti's ship with.
My GPU memory (according to task manager) doesn't seem to budge when running this program. GPU usage peaks at around 45%, never goes beyond that. My machine has plenty of power to spare, I don't understand why this would be an out of memory issue, unless it's limited to 2GB? When I use NCNN with the exact same model through a .bat file, I'm able to set the tile size in pixels, I usually have it set to 32 (minimum) and it works fine with all the same images in question. Doesn't necessarily eat up any additional VRAM, but the GPU usage is a bit higher.
here is during the operation, this was consistent
and after the operation has concluded
A 32 px tile size would mean it's processing in just 32x32 chunks of the image, which is really small, hence the low VRAM usage. I'm planning on implementing a more traditional tile size algorithm I just have been procrastinating on it. The current system re-uses my auto-tiling code.
Anyway, this is the first time I've seen this error with an Nvidia card. Is there any reason you aren't using PyTorch btw?
My GPU memory (according to task manager) doesn't seem to budge when running this program.
With 12GB of VRAM you should be able to easily upscale at the very least a 1000x image without even needing tiling. What model are you even using?
Hold up: Do you have an integrated graphics card? If you do, I think NCNN is trying to use that instead of your Nvidia card, which would explain both the error as well as the fact that it can't go past 512x
Is there any reason you aren't using PyTorch btw? I'm attempting to use some NCNN models from other projects, which I like the results of. I've tried quite a few but they all seem to have the same problem. ReaESRGAN-x4plus is the model I'm currently using. There's no options to convert NCNN to anything else so.. there it stays.
I have found possibly a part of the issue. I had CUDA Toolkit 10.2 still installed, never updated it since it still worked for the projects I've been using. Updated to CUDA Toolkit 11.6 and tried to reinstall PyTorch to get it to recompile, both in anaconda on my system, and in chaiNNer itself since it looks like it uses its own integrated version. However it's still reporting cu113, when it should be built for cu116.
The GPU usage fluctuated a lot more but the same errors still occur on the same images.
I'm sure my system probably has IGFX, I'll try and rerun a queue and monitor its performance to see if it's doing anything.
ReaESRGAN-x4plus
Here's PyTorch versions of all the RealESRGAN models
However it's still reporting cu113
That's just the version of CUDA that comes with PyTorch. It doesn't use the CUDA toolkit you have installed on your system. Why? Ask the PyTorch team, but it's the reason PyTorch is 3GB.
reinstall PyTorch to get it to recompile
Installing PyTorch from pip does not compile it. You are downloading a pre-built wheel file.
Also, CUDA & PyTorch have nothing to do with NCNN. NCNN is entirely separate and uses Vulkan for processing.
If your system does have integrated graphics, that would explain the problem. I'm currently working on putting in a GPU selector into settings to hopefully allow you to pick your Nvidia GPU for NCNN and get rid of this issue. But, since you're just using RealESRGAN you also can just use the PyTorch version, and PyTorch will definitely use your GPU.
@Jason-Bloomer would you mind testing this build and seeing if the error still happens? I changed a small thing about how NCNN allocates stuff. It might fix the issue but I kinda doubt it. https://cdn.discordapp.com/attachments/930865463318179952/1017897945271631922/chaiNNer-win32-x64-0.12.3.zip
Error still occurs, with both versions, the 0.12.3 you posted above and the 0.12.4 current that has gpu selection. My Nvidia GPU shows as the only selectable option (GPU 0) but the images still do not process, and I still get the same long laundry list of vkqueuesubmit failures.
It's really not a huge deal for me, as I can still use the models themselves through the batch command normally as I always have.
And to be honest, I have no idea how most of this stuff works under the hood, though I'm trying to learn. I was more or less hoping it was user error and something stupid I had done or misconfigured on my end that would result in an easy fix.
https://github.com/HomeOfVapourSynthEvolution/VapourSynth-RIFE-ncnn-Vulkan/issues/2 https://github.com/xinntao/Real-ESRGAN/issues/106
see the above for workaround TDR on windows
@Jason-Bloomer @joeyballentine
@nihui thanks for the suggestion!
@Jason-Bloomer please let me know if that fix works
@Jason-Bloomer Is this still an issue? I think with the estimation it should be erroring far less, if at all now.
Sorry for the delay but I haven't really had time to mess with this recently. Just updated to the most recent version (0.15.3) and, while I am no longer getting the "vkQueueSubmit" errors, I am, still, getting an error:
An error occurred in a Image File Iterator node:
Errors occurred during iteration:
• A critical error has occurred. You may need to restart chaiNNer in order for NCNN upscaling to start working again.
It now seems to produce appropriately-sized but all-black images for the images it was previously erroring on.
Sorry for the late reply.
I don't think there's really anything else I can do about this. I tried my best to work around NCNN's issues, but it seems to just hate some people's systems for some reason.
@SpaceMageWhatever
is this not fixed yet? i used to be able to upscale everything, then everything just, randomly broke for no reason, sometimes i can get things to upscale but the images have random black squares, most of the time it just, randomly fails, its super frustrating as it used to work fine
I can't do anything to fix this. It's an inherent problem with ncnn, Vulkan, and your GPU. If using the smallest possible tile size doesn't do it, then you're just out of luck
Information:
Description Errors occur after processing images with iteration, or when processing a single image by itself. Usually only happens when the image is larger than 512 pixels on at least one dimension. If I downscale the images first so they are never larger than 512px in any direction I can process all images without problem. Sorry if this is a duplicate issue reported elsewhere. I looked and saw several people reporting similar problems, but with slightly different context, and mostly on MacOS. This is occurring for me on Windows10.
An error occurred in a Image File Iterator node:
Logs renderer.log main.log