unexpected error with real esrgan ncnn model

jaredmontoya commented 2 years ago

Information:

Chainner version: [0.13.0]
OS: [Manjaro Linux]
RAM: 8GB

Description Even if gpu is selected for use with ncnn it uses cpu and fails

Screenshot_20221003_225237 I used real esrgan anime model because even if i use regular x4plus model how it is described in README.md ./realesrgan-ncnn-vulkan -i inputs -o outputs -n realesrgan-x4plus I always get vkQueueSubmit failed and can only run its pytorch or onnx version on cpu and anime model is working absolutely correct when used in that way ./realesrgan-ncnn-vulkan -i inputs -o outputs -n realesrgan-x4plus-anime

but when I use it in chainner I get vkQueueSubmit failed error in command line even with anime model that runs fine, when I changed tiling settings from auto to 4096 it stopped saying unexpected error in chainner, but it started asking for more extreme tiling, but there is nothing more extreme to set than 4096

Screenshot_20221003_225223 I set my integrated gpu as ncnn device

Screenshot_20221003_225201 I started chainner from command line to see logs and it says device: cpu and fp16 is set to False, so I think defaulting to cpu can be the root of the problem, if it just says cpu because my gpu is integrated, then I don't know what causes the problem, the only other thing I can think of is that real esrgan's ncnn runtime is modified

Logs Archive.zip

joeyballentine commented 2 years ago

The vkQueueSubmit failed error strikes again... Make sure you have the right graphics drivers installed. I don't really know anything about integrated graphics but based on a little googling it seems like this integrated GPU is too old to have vulkan drivers on windows -- but since you're on linux I think you should be able to get it to work. However since it seems like its pretty old, it might just have issues no matter what. Unfortunately, this isn't really an error I can solve as it just seems to be a problem with the official ncnn python bindings (which the xxxxx-ncnn-vulkan projects don't use). I wish I knew what to do to solve this.

jaredmontoya commented 2 years ago

thanks for the explanation

joeyballentine commented 2 years ago

Actually, taking a look again and thinking about it more I've made a realization: you definitely can't upscale this as your integrated graphics card uses your system ram, not your vram. So if it's getting to the point where it's using up all your ram that probably means it's just too much for your PC to handle. How big of an image are you trying to upscale?

RunDevelopment commented 2 years ago

@joeyballentine Is this an issue that could be solved with more aggressive tiling?

joeyballentine commented 2 years ago

@RunDevelopment if it's really using system ram then no, because we hold onto the tiles in ram. If anything tiling would just make it worse

jaredmontoya commented 2 years ago

Actually, taking a look again and thinking about it more I've made a realization: you definitely can't upscale this as your integrated graphics card uses your system ram, not your vram. So if it's getting to the point where it's using up all your ram that probably means it's just too much for your PC to handle. How big of an image are you trying to upscale?

Screenshot_20221004_195221

Screenshot_20221004_195107

image that I tried to upscale was 320x768, if I assume that when chainner console logs say cpu they means cpu and not integrated gpu, then as I showed in the third screenshot, during upscaling ncnn for some reason uses cpu and for that disables fp16 instead of using integrated gpu as selected in my second screenshot, my gpu may run out of ram with heavier x4plus model but it still can handle x4plus-anime model when ran by realesrgan-ncnn-vulkan project, when I run it, it says fp16 is on in the console, as far as I know cpu can't do fp16, so realesrgan-ncnn-vulkan project is definetely using gpu as intended and does not default to cpu like ncnn runtime in chainner and can run x4plus-anime model on my gpu without any problems. So what I am trying to say is that maybe ncnn in chainner is defaulting to cpu because of broken offical bindings(like you said) or because of some other reason, and usage of cpu instead of gpu is the cause of an error, maybe if it used my gpu and fp16 it would run normally like with realesrgan-ncnn-vulkan -n realesrgan-x4plus-anime, after all fp16 is really good at lowering vram(in my case ram) usage, and that can be the reason why ncnn-vulkan -n realesrgan-x4plus-anime doesn't run out of memory

joeyballentine commented 2 years ago

Sorry, the CPU and fp16 you see in the logs are for pytorch. I keep forgetting to remove those (they log no matter what which is confusing).

NCNN always uses your GPU, as we don't even have code to support CPU upscaling for it. I think the code in realesrgan's CLI is doing something that we aren't, or that the python bindings are not doing. I think @theflyingzamboni found something that was different between them a while back, but we don't really know enough about how ncnn works to properly integrate that.

joeyballentine commented 2 years ago

Also, you still get this error using the RealESRGAN CLI -- does it actually finish processing?

jaredmontoya commented 2 years ago

Also, you still get this error using the RealESRGAN CLI -- does it actually finish processing?

realesrgan-ncnn-vulkan has 0 errors and finishes processing when using x4plus-anime model, and as far as I remember it also works with realsr anime video models, but if you are asking about x4plus model that I had troubles with before, then yes it still outputs vkQueueSubmit error as I didn't change or fix anything, I can assume how ncnn works only based on logs and search result, I don't know how it works too because I am too scared to try to understand entire ncnn codebase and especially mess with complex c/c++ code because I am actually really bad at those programming languages, I know only Python, Go and recently started learning Rust.

joeyballentine commented 2 years ago

@HACCKKER would you mind testing the latest release (0.14) and seeing if this is still an issue?

jaredmontoya commented 2 years ago

ok, I will try

jaredmontoya commented 2 years ago

sad news, same errors in the terminal, VkWaitForFences and vkQueueSubmit errors like before, I used similar by resolution and type image to the one I used before. now the output is always that: Screenshot_20221028_214625 logs.zip

jaredmontoya commented 2 years ago

also I noticed that now it crashes after cpu load becomes 100% and then rapidly decreases back to what it was before

joeyballentine commented 2 years ago

I'm guessing then that it is still just related to the fact that you are using integrated graphics... I'm afriad there's nothing else I can do at this point

joeyballentine commented 2 years ago

I just wish I knew what RealESRGAN's CLI was doing differently.... Last I checked, we basically do all the exact same things as it. Really the only difference is it uses extremely small tile sizes

jaredmontoya commented 2 years ago

I'm guessing then that it is still just related to the fact that you are using integrated graphics... I'm afriad there's nothing else I can do at this point

you are probably right.

chaiNNer-org / chaiNNer

unexpected error with real esrgan ncnn model #1057