Error interpolating frames

xRoyBx commented 3 years ago

Hello, i always get this error during interpolation and can't proceed forward:

/content/DAIN revise the unique id to a random numer 68776 Namespace(SAVED_MODEL=None, alpha=[0.0, 1.0], arg='./model_weights/68776-Tue-Nov-10-11-46/args.txt', batch_size=1, channels=3, ctx_lr_coe=1.0, datasetName='Vimeo_90K_interp', datasetPath='', dataset_split=97, debug=False, depth_lr_coe=0.001, dtype=<class 'torch.cuda.FloatTensor'>, end_frame=1259, epsilon=1e-06, factor=0.2, filter_lr_coe=1.0, filter_size=4, flow_lr_coe=0.01, force=False, frame_input_dir='/content/DAIN/input_frames', frame_output_dir='/content/DAIN/output_frames', log='./model_weights/68776-Tue-Nov-10-11-46/log.txt', lr=0.002, netName='DAIN_slowmotion', no_date=False, numEpoch=100, occ_lr_coe=1.0, patience=5, rectify_lr=0.001, save_path='./model_weights/68776-Tue-Nov-10-11-46', save_which=1, seed=1, start_frame=1, time_step=0.5, uid=None, use_cuda=True, use_cudnn=1, weight_decay=0, workers=8) cudnn is used Interpolate 1 frames error in correlation_forward_cuda_kernel: no kernel image is available for execution on the device Warning: Legacy autograd function with non-static forward method is deprecated and will be removed in 1.3. Please use new-style autograd function with static forward method. (Example: https://pytorch.org/docs/stable/autograd.html#torch.autograd.Function) (THPFunction_do_forward at /pytorch/torch/csrc/autograd/python_function.cpp:622) Traceback (most recent call last): File "colab_interpolate.py", line 112, in y_s, offset, filter = model(torch.stack((X0, X1),dim = 0)) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, kwargs) File "/content/DAIN/networks/DAIN_slowmotion.py", line 148, in forward self.forward_flownets(self.flownets, cur_offset_input, time_offsets=time_offsets), File "/content/DAIN/networks/DAIN_slowmotion.py", line 212, in forward_flownets temp = model(input) # this is a single direction motion results, but not a bidirectional one File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, *kwargs) File "/content/DAIN/PWCNet/PWCNet.py", line 221, in forward corr6 = self.corr(c16, c26) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(input, kwargs) File "/content/DAIN/PWCNet/correlation_package_pytorch1_0/correlation.py", line 59, in forward result = CorrelationFunction(self.pad_size, self.kernel_size, self.max_displacement,self.stride1, self.stride2, self.corr_multiply)(input1, input2) File "/content/DAIN/PWCNet/correlation_package_pytorch1_0/correlation.py", line 27, in forward self.pad_size, self.kernel_size, self.max_displacement,self.stride1, self.stride2, self.corr_multiply) RuntimeError: CUDA call failed (correlation_forward_cuda at correlation_cuda.cc:80) frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x7f3c81ae3193 in /usr/local/lib/python3.6/dist-packages/torch/lib/libc10.so) frame #1: correlation_forward_cuda(at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, int, int, int, int, int, int) + 0x628 (0x7f3c7e117b38 in /usr/local/lib/python3.6/dist-packages/correlation_cuda-0.0.0-py3.6-linux-x86_64.egg/correlation_cuda.cpython-36m-x86_64-linux-gnu.so) frame #2: + 0x1bd4a (0x7f3c7e127d4a in /usr/local/lib/python3.6/dist-packages/correlation_cuda-0.0.0-py3.6-linux-x86_64.egg/correlation_cuda.cpython-36m-x86_64-linux-gnu.so) frame #3: + 0x18890 (0x7f3c7e124890 in /usr/local/lib/python3.6/dist-packages/correlation_cuda-0.0.0-py3.6-linux-x86_64.egg/correlation_cuda.cpython-36m-x86_64-linux-gnu.so) frame #4: python3() [0x50a4a5]

frame #7: python3() [0x594a01] frame #9: THPFunction_do_forward(THPFunction*, _object*) + 0x4ac (0x7f3ccaaf4d4c in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_python.so) frame #11: python3() [0x54a971] frame #13: python3() [0x50a433] frame #16: python3() [0x594a01] frame #19: python3() [0x507be4] frame #21: python3() [0x594a01] frame #22: python3() [0x54a971] frame #24: python3() [0x50a433] frame #26: python3() [0x507be4] frame #28: python3() [0x594a01] frame #31: python3() [0x507be4] frame #33: python3() [0x594a01] frame #34: python3() [0x54a971] frame #36: python3() [0x50a433] frame #38: python3() [0x507be4] frame #39: python3() [0x509900] frame #40: python3() [0x50a2fd] frame #42: python3() [0x507be4] frame #44: python3() [0x594a01] frame #47: python3() [0x507be4] frame #49: python3() [0x594a01] frame #50: python3() [0x54a971] frame #52: python3() [0x50a433] frame #54: python3() [0x507be4] frame #56: python3() [0x634e72] frame #61: __libc_start_main + 0xe7 (0x7f3cd5d04bf7 in /lib/x86_64-linux-gnu/libc.so.6)

iBobbyTS commented 3 years ago

I think you didn't build the packages correctly, make sure you have PyTorch >=1.0.0, <=1.4.0. If you're just intrested in doing the interpolation, check my repo iBobbyTS/VFIN, it's very easy to use, DAIN is included, there are colab notebooks that I tested, and there's also a tar file with everything (python, pytorch and dain packages) installed correctly. Open an issue there if there are any problem with it.

xRoyBx commented 3 years ago

I don't know how to install/change PyTorch >=1.0.0, <=1.4.0 in colab (Win10). Anyway, using the "official" notebook by Styler00Dollar and Alpha or other related notebooks, i get the same error. I'll try with VFIN, thanks ;)

iBobbyTS commented 3 years ago

To install pytorch 1.4, simply run this: pip install torch==1.4.0 if you're using a notebook like Colab, add a ! befor, like !pip install torch==1.4.0 I’m modifying code in VFIN very often now, so there mifht be errors while someone else use it. I'm still learning about GitHub, I might start using the brunch and release systems to keep stable versions and somewhere else to develop.

TaoTeCha commented 3 years ago

This error has popped up a few times in the issues section of various DAIN colab repos. I have read it's possibly something with the GPU build. I would love for someone to look into this because I am not familiar at all with this area of coding and I've only gotten DAIN to work once (probably when I was assigned a P100). I get this same error when using different colabs for DAIN, specifically when I get a V100 I think.

Has anyone found a solution???

AlphaGit commented 3 years ago

Hi! Yes, I know what the error is. I don’t know exactly how to fix it.

This is the relevant error line:

error in correlation_forward_cuda_kernel: no kernel image is available for execution on the device

This means that the C native module built into CUDA is not properly compiled to a matching device. As Google keeps adding new models to their Colab support, we will keep finding these issues. This explains the symptoms that @TaoTeCha is seeing.

My first attempt at this was achieved in #87, where I manually added a bunch of models for all the GPUs I could find in Colab at that time (June 2020). I also added a structure that hopefully made it easier to add more overtime... but it’s less than ideal.

@xRoyBx Note that the version of the Colab with those fixes also suppresses a few warnings that I saw in your logs. Is it possible you’re not using the latest one? 1.5 is already in master, 1.5.1 is in a PR (#116).

If anyone knows how to achieve future general compatibility, I’d be glad to work on that.

TaoTeCha commented 3 years ago

I've trained and ran a handful of deep learning models in colab but in every case the GPU has been all set up and ready to go so I am totally ignorant with all this. I have a couple questions.

1) Why do you need to do a 15 minute 'build' with DAIN when I have never had to do this with any other model I've used? 2) What parameters do I need to change in the files to find a model that works with colab's V100? I'm willing to put in the trial and error work if someone enlightens me in what I should be changing.

Thanks

AlphaGit commented 3 years ago

DAIN is a mixture of different CNNs put together, some of them from previous papers. You can find more info here and in the original paper. So that you don’t have to run 6 CNNs in parallel, which is memory-expensive and incredibly slow, the authors compiled some of these “layers” into CUDA modules they could run in the GPU directly to train and infer with DAIN. These modules are the ones taking ~15 minutes and giving us these headaches.
Check out this file.

TaoTeCha commented 3 years ago

Not sure if it's a coincidence but I uncommented '-gencode', 'arch=compute_70,code=sm_70' in the compiler_args.py and switched to !pip install torch==1.0.0 torchvision==0.2.1

The colab is working with a V100 now. I'll probably use this a handful of times over the next week and I'll keep you updated if it continues to work.

Thanks!

xRoyBx commented 3 years ago

Thanks for interpolation fix, unfortunately i get this error when creating output video, apparently it doesn't create output frames:

/content/DAIN/output_frames ffmpeg version 3.4.8-0ubuntu0.2 Copyright (c) 2000-2020 the FFmpeg developers built with gcc 7 (Ubuntu 7.5.0-3ubuntu1~18.04) configuration: --prefix=/usr --extra-version=0ubuntu0.2 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --enable-gpl --disable-stripping --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librubberband --enable-librsvg --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-omx --enable-openal --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libopencv --enable-libx264 --enable-shared libavutil 55. 78.100 / 55. 78.100 libavcodec 57.107.100 / 57.107.100 libavformat 57. 83.100 / 57. 83.100 libavdevice 57. 10.100 / 57. 10.100 libavfilter 6.107.100 / 6.107.100 libavresample 3. 7. 0 / 3. 7. 0 libswscale 4. 8.100 / 4. 8.100 libswresample 2. 9.100 / 2. 9.100 libpostproc 54. 7.100 / 54. 7.100 [image2 @ 0x55727fd56000] Could not open file : .png [image2 @ 0x55727fd56000] Could not find codec parameters for stream 0 (Video: png, none(pc)): unspecified size Consider increasing the value for the 'analyzeduration' and 'probesize' options Input #0, image2, from '.png': Duration: 00:00:00.02, start: 0.000000, bitrate: N/A Stream #0:0: Video: png, none(pc), 60 tbr, 60 tbn, 60 tbc Output #0, mp4, to '/content/gdrive/My Drive/DAIN/output.mp4': Output file #0 does not contain any stream

CalledProcessError Traceback (most recent call last)

in () 1 # Create output video 2 get_ipython().magic('cd {FRAME_OUTPUT_DIR}') ----> 3 get_ipython().magic("shell ffmpeg -y -r {TARGET_FPS} -f image2 -pattern_type glob -i '*.png' '/content/gdrive/My Drive/{OUTPUT_FILE_PATH}'") 3 frames /usr/local/lib/python3.6/dist-packages/google/colab/_system_commands.py in check_returncode(self) 136 if self.returncode: 137 raise subprocess.CalledProcessError( --> 138 returncode=self.returncode, cmd=self.args, output=self.output) 139 140 def _repr_pretty_(self, p, cycle): # pylint:disable=unused-argument CalledProcessError: Command 'ffmpeg -y -r 60 -f image2 -pattern_type glob -i '*.png' '/content/gdrive/My Drive/DAIN/output.mp4'' returned non-zero exit status 1.

TaoTeCha commented 3 years ago

Are you sure your output path exists? Do you have a folder in your drive named DAIN? When you mounted you drive, did you mount it as gdrive or just drive? Try changing the output to '/content/output.mp4' and just download from the colab file folder.

Or try !ffmpeg instead of %shell ffmpeg

xRoyBx commented 3 years ago

!ffmpeg

Are you sure your output path exists? Do you have a folder in your drive named DAIN? When you mounted you drive, did you mount it as gdrive or just drive? Try changing the output to '/content/output.mp4' and just download from the colab file folder.

Or try !ffmpeg instead of %shell ffmpeg

Paths are ok, DAIN folder is present, drive mounted as Gdrive: the problem is always the same (forget my previous post, sorry): it doesn't create output png frames despite the output folder is present (using Tesla V100 in colab)

iBobbyTS commented 3 years ago

In VFIN, you don't need to worry about that, just specify -ot video, it will generate a mp4 in your input folder, if you specify the outpht by -o , you can use any extension and save it anywhere.

xRoyBx commented 3 years ago

In VFIN, you don't need to worry about that, just specify -ot video, it will generate a mp4 in your input folder, if you specify the outpht by -o , you can use any extension and save it anywhere.

using this command: !/content/python/bin/python3 /content/VFIN/run.py -i "/content/drive/My Drive/VFIN/input.mp4" -o "/content/drive/My Drive/VFIN/output.mp4"

i get this error /content/python/bin/python3: can't open file '/content/VFIN/run.py': [Errno 2] No such file or directory

iBobbyTS commented 3 years ago

In VFIN, you don't need to worry about that, just specify -ot video, it will generate a mp4 in your input folder, if you specify the outpht by -o , you can use any extension and save it anywhere.

using this command:

!/content/python/bin/python3 /content/VFIN/run.py -i "/content/drive/My Drive/VFIN/input.mp4" -o "/content/drive/My Drive/VFIN/output.mp4"

i get this error

/content/python/bin/python3: can't open file '/content/VFIN/run.py': [Errno 2] No such file or directory

Sorry, I changed the name of the runing file, for this time, use !/content/python/bin/python3 /content/VFIN/run_class.py -i "/content/drive/My Drive/VFIN/input.mp4" -o "/content/drive/My Drive/VFIN/output.mp4" instead. I fixed the GitHub repo and the pre-built tar file, copy the tar file to your drive again and use it next time, copy the notebook too, I edited it. By the way, you need -a DAIN -ot video to make it use DAIN and output a video.

iBobbyTS commented 3 years ago

Any problem about VFIN, please open issues there.

baowenbo / DAIN

Error interpolating frames #117