DeadSix27 / waifu2x-converter-cpp

Improved fork of Waifu2X C++ using OpenCL and OpenCV
MIT License
792 stars 86 forks source link

Upscaling very large images got crashed result #122

Closed kattjevfel closed 5 years ago

kattjevfel commented 5 years ago

When trying to upscale an image of 6416x8592 resolution I got a segmentation fault , to easily be able to reproduce this issue I created a 7000x7000 file with imagemagick:

~ » convert -size 7000x7000 xc:white white.png && waifu2x-converter-cpp -s -i white.png
CUDA: GeForce GTX 1080
[1]    29734 segmentation fault (core dumped)  waifu2x-converter-cpp -s -i white.png

Full output without --silent

I'm fairly certain it has nothing to do with running out of memory as there's plenty available: screenshot

VRAM/GPU ruled out, still crashes when using --disable-gpu: https://gist.github.com/Kattus/4dd7871633d2bb89760426c31b99881f

I could upscale this and similar images just fine using waifu2x-caffe, but I am no longer using Windows so it is not an option unfortunately. Any help with this would be greatly appreciated, even if perhaps just a workaround.

kattjevfel commented 5 years ago

I just ran the program with valgrind to try narrow it down further and this part seemed most interesting to me (though I'm not a programmer):

==30478== Argument 'size' of function malloc has a fishy (possibly negative) value: -1938260944
==30478==    at 0x483777F: malloc (vg_replace_malloc.c:299)
==30478==    by 0x48D4836: W2Mat::W2Mat(int, int, int) (in /usr/lib/libw2xc.so)
==30478==    by 0x48C32B5: ??? (in /usr/lib/libw2xc.so)
==30478==    by 0x48C9922: ??? (in /usr/lib/libw2xc.so)
==30478==    by 0x48CAF53: w2xconv_convert_mat(W2XConv*, cv::Mat&, cv::Mat&, int, double, int, w2xconv_rgb_float3, bool, bool) (in /usr/lib/libw2xc.so)
==30478==    by 0x48CD3D1: w2xconv_convert_file (in /usr/lib/libw2xc.so)
==30478==    by 0x11A8C7: ??? (in /usr/bin/waifu2x-converter-cpp)
==30478==    by 0x11770F: ??? (in /usr/bin/waifu2x-converter-cpp)
==30478==    by 0x5BBE222: (below main) (in /usr/lib/libc-2.28.so)
==30478== 
==30478== Invalid write of size 2
==30478==    at 0x483E9DB: memmove (vg_replace_strmem.c:1270)
==30478==    by 0x48C33CF: ??? (in /usr/lib/libw2xc.so)
==30478==    by 0x48C9922: ??? (in /usr/lib/libw2xc.so)
==30478==    by 0x48CAF53: w2xconv_convert_mat(W2XConv*, cv::Mat&, cv::Mat&, int, double, int, w2xconv_rgb_float3, bool, bool) (in /usr/lib/libw2xc.so)
==30478==    by 0x48CD3D1: w2xconv_convert_file (in /usr/lib/libw2xc.so)
==30478==    by 0x11A8C7: ??? (in /usr/bin/waifu2x-converter-cpp)
==30478==    by 0x11770F: ??? (in /usr/bin/waifu2x-converter-cpp)
==30478==    by 0x5BBE222: (below main) (in /usr/lib/libc-2.28.so)
==30478==  Address 0x54 is not stack'd, malloc'd or (recently) free'd
==30478== 
==30478== 
==30478== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==30478==  Access not within mapped region at address 0x54
==30478==    at 0x483E9DB: memmove (vg_replace_strmem.c:1270)
==30478==    by 0x48C33CF: ??? (in /usr/lib/libw2xc.so)
==30478==    by 0x48C9922: ??? (in /usr/lib/libw2xc.so)
==30478==    by 0x48CAF53: w2xconv_convert_mat(W2XConv*, cv::Mat&, cv::Mat&, int, double, int, w2xconv_rgb_float3, bool, bool) (in /usr/lib/libw2xc.so)
==30478==    by 0x48CD3D1: w2xconv_convert_file (in /usr/lib/libw2xc.so)
==30478==    by 0x11A8C7: ??? (in /usr/bin/waifu2x-converter-cpp)
==30478==    by 0x11770F: ??? (in /usr/bin/waifu2x-converter-cpp)
==30478==    by 0x5BBE222: (below main) (in /usr/lib/libc-2.28.so)
==30478==  If you believe this happened as a result of a stack
==30478==  overflow in your program's main thread (unlikely but
==30478==  possible), you can try to increase the size of the
==30478==  main thread stack using the --main-stacksize= flag.
==30478==  The main thread stack size used in this run was 8388608.

full log

I'm running master-b4566c0

DeadSix27 commented 5 years ago

@YukihoAA This is beyond my c++ skills, do you have any idea why memcpy fails here? Is the pointer the src culprit? https://github.com/DeadSix27/waifu2x-converter-cpp/blob/master/src/convertRoutine.cpp#L184

It comes from here: https://github.com/DeadSix27/waifu2x-converter-cpp/blob/master/src/cvwrap.hpp#L94

YukihoAA commented 5 years ago

@DeadSix27

When I was trying to implement upconv_7, there was some problem with W2Mat copy and ptr function same to me.

It works fine in most of the cases but I knew that function use the memory that is not reserved for W2Mat.

I think It might has problem with memset, or it can be problem of stack size limit.

I'll try to implement W2Mat in my way but I cannot guaranty whether it works of not.

YukihoAA commented 5 years ago

As we can see there is too many memory problem with W2Mat.

How can we return local variable in C++?? it is not JAVA! (std::move is actually do nothing)

image

image

usually this is very short period of lifetime cycle and It contains not much big size of information so it won't cause problems in most of the time, but there is chance to some other function to write over the information that we need.

I think we need to re produce W2Mat class.

YukihoAA commented 5 years ago

I was working on this issue but I have some problem with my job. someone may continue my work : No_W2Mat branch (actually it still need w2mat. I got 1 compile error now)

YukihoAA commented 5 years ago

segfault with master-a268359 image

no segfault with no_W2Mat-6a96ec9 image

result of no_W2Mat-6a96ec9

951x938 => 7608x7504 (x2 scaled for 4 times) works fine.

image

not working for 7608x7504 =>15216x15008

image

image

result is changing every conversion.

so there is another issue with memory (it could be problem with stack size limit or someting)

but segfault is fixed.

YukihoAA commented 5 years ago

It works for 6500x6500 but not working at 6800x6800 so limit should between them

result 6800x6800 / 7000x7000 (clean for 6500) image

YukihoAA commented 5 years ago

As #145 mentioned, It is problem with image size limit is INT_MAX.

so, output image has to be under size 13370x13370. otherwise it will cause wrong result or segfalt.

We will make some other PRs to support this issue may be split images in 4.. or more pieces and scale it. after operation done, combine them and output image comes out.

or

re-implement all functions to support long long but it may unsafe to do it

i'd recommend you to split images in small pieces and scale them separately until we make changes.

EDIT: I will divide input images in 4^n pieces when width is bigger than 6000/12000/18000

YukihoAA commented 5 years ago

[Test reault of master]

  1. 6685 x2.0 ok (input: 44,689,225, output: 178,756,920) 1-1. 6685 x3.0 crash (input: 44,689,225, output: 402,223,025, max: 715,027,600)

  2. 6686 x2.0 crash (input: 44,702,596, output: 178,810,384)

  3. 4460 x 3.0 bad result (input: 19,891,600, output: 179,024,400, max: 318,265,600)

  4. 4450 x 3.0 bad result (input: 19,802,500, output: 178,222,500, max: 316,840,000)

  5. 3350 x 3.0 bad result (input: 11,222,500, output: 101,002,500, max: 179,560,000)

  6. 3344 x 3.0 crash (input: 11,182,336, output: 100,641,024. max: 178,917,424)

  7. 3343 x 3.0 crash (input: 11,175,649, output: 100,580,841. max: 178,810,384)

  8. 3342 x 3.0 ok (input: 11,168,964, output: 100,520,676. max: 178,703,424)

  9. 3342 x 4.0 ok (input: 11,168,964, output: 178,703,424. max: 178,703,424)

  10. 3342 x 4.1 crash (input: 11,168,964, output: 187,750,284. max: 714,813,696)

  11. 1672 x 4.1 crash (input: 2,795,584, output: 46.993.767, max: 178,917,376)

  12. 1671 x 4.1 ok (input: 2,795,584, output: 46,937,571, max: 178,703,424)

  13. 1671 x 8.0 ok (input: 2,795,584, output: 178,703,424, max: 178,703,424)

  14. 836 x 9.0 crash (input: 698,896, output: 56,610,576, max: 178,917,376)

  15. 835 x 9.0 ok (input: 697,225, output: 56,475,225, max: 178,489,600)

[ if convert 836 x8.0 first and x1.125 (= x16.0 => x0.5625 ) ] step 1. 836 x 8.0 ok (input:698,896, output: 44,729,344, max: 44,729,344) step 2. 6688 x 1.125 crash (input: 44,729,344, output: 56,610,576, max: 178,917,376) (time spent: 28.717 + crash)

[ if convert 836 x5.0 first and x1.8 (= x8.0 => x0.625 => x2.0 => x0.9 ) ] step 1. 836 x 5.0 ok (input: 698,896, output: 17,472,400, max: 44,729,344) step 2. 4180 x 1.8 ok (input: 17,472,400, output: 56,610,576, max: 69,889,600) output: 7524 x 7524 (time spent: 26.075 + 41.171)

[ if convert 836 x4.5 first and x2.0 (= x8.0 => x0.5625 => x2.0 ) ] step 1. 836 x 4.5 ok (input: 698,896, output: 14,152,644, max: 44,729,344) step 2. 3762 x 2.0 ok (input: 14,152,644, output: 56,610,576, max: 56,610,576) output: 7524 x 7524 (time spent: 25.672 + 34.556)

[ if convert 836 x2.25 first and x4.0 (= x4.0 => x0.5625 => x4.0 ) ] step 1. 836 x 2.25 ok (input: 698,896, output: 3,538,161, max: 11,182,336) step 2. 1881 x 4.0ok (input: 3,538,161, output: 56,610,576, max: 56,610,576) output: 7524 x 7524 (time spent: 6.445 + 35.709)

[ if convert 836 x1.125 first and x8.0 (= x2.0 => x0.5625 => x8.0 ) ] step 1. 836 x 1.125 ok (input: 698,896, output: 884,540, max: 2,795,584) step 2. 940 x 8 ok (input: 883,600, output: 56,550,400, max: 56,550,400) *output: 7520 x 7520 (time spent: 1.630 + 36.093)

you can see w2x wastes memory and time when it converting over x8, it will increase even more when x16 or x32