Closed kattjevfel closed 5 years ago
I just ran the program with valgrind to try narrow it down further and this part seemed most interesting to me (though I'm not a programmer):
==30478== Argument 'size' of function malloc has a fishy (possibly negative) value: -1938260944
==30478== at 0x483777F: malloc (vg_replace_malloc.c:299)
==30478== by 0x48D4836: W2Mat::W2Mat(int, int, int) (in /usr/lib/libw2xc.so)
==30478== by 0x48C32B5: ??? (in /usr/lib/libw2xc.so)
==30478== by 0x48C9922: ??? (in /usr/lib/libw2xc.so)
==30478== by 0x48CAF53: w2xconv_convert_mat(W2XConv*, cv::Mat&, cv::Mat&, int, double, int, w2xconv_rgb_float3, bool, bool) (in /usr/lib/libw2xc.so)
==30478== by 0x48CD3D1: w2xconv_convert_file (in /usr/lib/libw2xc.so)
==30478== by 0x11A8C7: ??? (in /usr/bin/waifu2x-converter-cpp)
==30478== by 0x11770F: ??? (in /usr/bin/waifu2x-converter-cpp)
==30478== by 0x5BBE222: (below main) (in /usr/lib/libc-2.28.so)
==30478==
==30478== Invalid write of size 2
==30478== at 0x483E9DB: memmove (vg_replace_strmem.c:1270)
==30478== by 0x48C33CF: ??? (in /usr/lib/libw2xc.so)
==30478== by 0x48C9922: ??? (in /usr/lib/libw2xc.so)
==30478== by 0x48CAF53: w2xconv_convert_mat(W2XConv*, cv::Mat&, cv::Mat&, int, double, int, w2xconv_rgb_float3, bool, bool) (in /usr/lib/libw2xc.so)
==30478== by 0x48CD3D1: w2xconv_convert_file (in /usr/lib/libw2xc.so)
==30478== by 0x11A8C7: ??? (in /usr/bin/waifu2x-converter-cpp)
==30478== by 0x11770F: ??? (in /usr/bin/waifu2x-converter-cpp)
==30478== by 0x5BBE222: (below main) (in /usr/lib/libc-2.28.so)
==30478== Address 0x54 is not stack'd, malloc'd or (recently) free'd
==30478==
==30478==
==30478== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==30478== Access not within mapped region at address 0x54
==30478== at 0x483E9DB: memmove (vg_replace_strmem.c:1270)
==30478== by 0x48C33CF: ??? (in /usr/lib/libw2xc.so)
==30478== by 0x48C9922: ??? (in /usr/lib/libw2xc.so)
==30478== by 0x48CAF53: w2xconv_convert_mat(W2XConv*, cv::Mat&, cv::Mat&, int, double, int, w2xconv_rgb_float3, bool, bool) (in /usr/lib/libw2xc.so)
==30478== by 0x48CD3D1: w2xconv_convert_file (in /usr/lib/libw2xc.so)
==30478== by 0x11A8C7: ??? (in /usr/bin/waifu2x-converter-cpp)
==30478== by 0x11770F: ??? (in /usr/bin/waifu2x-converter-cpp)
==30478== by 0x5BBE222: (below main) (in /usr/lib/libc-2.28.so)
==30478== If you believe this happened as a result of a stack
==30478== overflow in your program's main thread (unlikely but
==30478== possible), you can try to increase the size of the
==30478== main thread stack using the --main-stacksize= flag.
==30478== The main thread stack size used in this run was 8388608.
I'm running master-b4566c0
@YukihoAA This is beyond my c++ skills, do you have any idea why memcpy fails here? Is the pointer the src culprit? https://github.com/DeadSix27/waifu2x-converter-cpp/blob/master/src/convertRoutine.cpp#L184
It comes from here: https://github.com/DeadSix27/waifu2x-converter-cpp/blob/master/src/cvwrap.hpp#L94
@DeadSix27
When I was trying to implement upconv_7, there was some problem with W2Mat copy and ptr function same to me.
It works fine in most of the cases but I knew that function use the memory that is not reserved for W2Mat.
I think It might has problem with memset, or it can be problem of stack size limit.
I'll try to implement W2Mat in my way but I cannot guaranty whether it works of not.
As we can see there is too many memory problem with W2Mat.
How can we return local variable in C++?? it is not JAVA! (std::move is actually do nothing)
usually this is very short period of lifetime cycle and It contains not much big size of information so it won't cause problems in most of the time, but there is chance to some other function to write over the information that we need.
I think we need to re produce W2Mat class.
I was working on this issue but I have some problem with my job. someone may continue my work : No_W2Mat branch (actually it still need w2mat. I got 1 compile error now)
segfault with master-a268359
no segfault with no_W2Mat-6a96ec9
result of no_W2Mat-6a96ec9
951x938 => 7608x7504 (x2 scaled for 4 times) works fine.
not working for 7608x7504 =>15216x15008
result is changing every conversion.
so there is another issue with memory (it could be problem with stack size limit or someting)
but segfault is fixed.
It works for 6500x6500 but not working at 6800x6800 so limit should between them
result 6800x6800 / 7000x7000 (clean for 6500)
As #145 mentioned, It is problem with image size limit is INT_MAX.
so, output image has to be under size 13370x13370. otherwise it will cause wrong result or segfalt.
We will make some other PRs to support this issue may be split images in 4.. or more pieces and scale it. after operation done, combine them and output image comes out.
or
re-implement all functions to support long long but it may unsafe to do it
i'd recommend you to split images in small pieces and scale them separately until we make changes.
EDIT: I will divide input images in 4^n pieces when width is bigger than 6000/12000/18000
[Test reault of master]
6685 x2.0 ok (input: 44,689,225, output: 178,756,920) 1-1. 6685 x3.0 crash (input: 44,689,225, output: 402,223,025, max: 715,027,600)
6686 x2.0 crash (input: 44,702,596, output: 178,810,384)
4460 x 3.0 bad result (input: 19,891,600, output: 179,024,400, max: 318,265,600)
4450 x 3.0 bad result (input: 19,802,500, output: 178,222,500, max: 316,840,000)
3350 x 3.0 bad result (input: 11,222,500, output: 101,002,500, max: 179,560,000)
3344 x 3.0 crash (input: 11,182,336, output: 100,641,024. max: 178,917,424)
3343 x 3.0 crash (input: 11,175,649, output: 100,580,841. max: 178,810,384)
3342 x 3.0 ok (input: 11,168,964, output: 100,520,676. max: 178,703,424)
3342 x 4.0 ok (input: 11,168,964, output: 178,703,424. max: 178,703,424)
3342 x 4.1 crash (input: 11,168,964, output: 187,750,284. max: 714,813,696)
1672 x 4.1 crash (input: 2,795,584, output: 46.993.767, max: 178,917,376)
1671 x 4.1 ok (input: 2,795,584, output: 46,937,571, max: 178,703,424)
1671 x 8.0 ok (input: 2,795,584, output: 178,703,424, max: 178,703,424)
836 x 9.0 crash (input: 698,896, output: 56,610,576, max: 178,917,376)
835 x 9.0 ok (input: 697,225, output: 56,475,225, max: 178,489,600)
[ if convert 836 x8.0 first and x1.125 (= x16.0 => x0.5625 ) ] step 1. 836 x 8.0 ok (input:698,896, output: 44,729,344, max: 44,729,344) step 2. 6688 x 1.125 crash (input: 44,729,344, output: 56,610,576, max: 178,917,376) (time spent: 28.717 + crash)
[ if convert 836 x5.0 first and x1.8 (= x8.0 => x0.625 => x2.0 => x0.9 ) ] step 1. 836 x 5.0 ok (input: 698,896, output: 17,472,400, max: 44,729,344) step 2. 4180 x 1.8 ok (input: 17,472,400, output: 56,610,576, max: 69,889,600) output: 7524 x 7524 (time spent: 26.075 + 41.171)
[ if convert 836 x4.5 first and x2.0 (= x8.0 => x0.5625 => x2.0 ) ] step 1. 836 x 4.5 ok (input: 698,896, output: 14,152,644, max: 44,729,344) step 2. 3762 x 2.0 ok (input: 14,152,644, output: 56,610,576, max: 56,610,576) output: 7524 x 7524 (time spent: 25.672 + 34.556)
[ if convert 836 x2.25 first and x4.0 (= x4.0 => x0.5625 => x4.0 ) ] step 1. 836 x 2.25 ok (input: 698,896, output: 3,538,161, max: 11,182,336) step 2. 1881 x 4.0ok (input: 3,538,161, output: 56,610,576, max: 56,610,576) output: 7524 x 7524 (time spent: 6.445 + 35.709)
[ if convert 836 x1.125 first and x8.0 (= x2.0 => x0.5625 => x8.0 ) ] step 1. 836 x 1.125 ok (input: 698,896, output: 884,540, max: 2,795,584) step 2. 940 x 8 ok (input: 883,600, output: 56,550,400, max: 56,550,400) *output: 7520 x 7520 (time spent: 1.630 + 36.093)
you can see w2x wastes memory and time when it converting over x8, it will increase even more when x16 or x32
When trying to upscale an image of 6416x8592 resolution I got a segmentation fault , to easily be able to reproduce this issue I created a 7000x7000 file with imagemagick:
Full output without
--silent
I'm fairly certain it has nothing to do with running out of memory as there's plenty available:
VRAM/GPU ruled out, still crashes when using
--disable-gpu
: https://gist.github.com/Kattus/4dd7871633d2bb89760426c31b99881fI could upscale this and similar images just fine using waifu2x-caffe, but I am no longer using Windows so it is not an option unfortunately. Any help with this would be greatly appreciated, even if perhaps just a workaround.