baowenbo / DAIN

Depth-Aware Video Frame Interpolation (CVPR 2019)
https://sites.google.com/view/wenbobao/dain
MIT License
8.18k stars 839 forks source link

Colab pro error Interpolation #98

Open AlexU225 opened 4 years ago

AlexU225 commented 4 years ago

https://colab.research.google.com/github/AhabbscienceStudioPak/DAIN/blob/master/DAIN_Colab.ipynb#scrollTo=LH7EmLT2gA4l colab PRO assign GPU name, driver_version, memory.total [MiB] Tesla V100-SXM2-16GB, 418.67, 16130 MiB

interpolation

/content/DAIN revise the unique id to a random numer 91876 Namespace(SAVED_MODEL=None, alpha=[0.0, 1.0], arg='./model_weights/91876-Thu-Sep-03-17-38/args.txt', batch_size=1, channels=3, ctx_lr_coe=1.0, datasetName='Vimeo_90K_interp', datasetPath='', dataset_split=97, debug=False, depth_lr_coe=0.001, dtype=<class 'torch.cuda.FloatTensor'>, end_frame=137, epsilon=1e-06, factor=0.2, filter_lr_coe=1.0, filter_size=4, flow_lr_coe=0.01, force=False, frame_input_dir='/content/DAIN/input_frames', frame_output_dir='/content/DAIN/output_frames', log='./model_weights/91876-Thu-Sep-03-17-38/log.txt', lr=0.002, netName='DAIN_slowmotion', no_date=False, numEpoch=100, occ_lr_coe=1.0, patience=5, rectify_lr=0.001, save_path='./model_weights/91876-Thu-Sep-03-17-38', save_which=1, seed=1, start_frame=1, time_step=0.2997002997002997, uid=None, use_cuda=True, use_cudnn=1, weight_decay=0, workers=8) cudnn is used Interpolate 2 frames error in correlation_forward_cuda_kernel: no kernel image is available for execution on the device Warning: Legacy autograd function with non-static forward method is deprecated and will be removed in 1.3. Please use new-style autograd function with static forward method. (Example: https://pytorch.org/docs/stable/autograd.html#torch.autograd.Function) (THPFunction_do_forward at /pytorch/torch/csrc/autograd/python_function.cpp:622) Traceback (most recent call last): File "colab_interpolate.py", line 112, in y_s, offset, filter = model(torch.stack((X0, X1),dim = 0)) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, kwargs) File "/content/DAIN/networks/DAIN_slowmotion.py", line 148, in forward self.forward_flownets(self.flownets, cur_offset_input, time_offsets=time_offsets), File "/content/DAIN/networks/DAIN_slowmotion.py", line 212, in forward_flownets temp = model(input) # this is a single direction motion results, but not a bidirectional one File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, *kwargs) File "/content/DAIN/PWCNet/PWCNet.py", line 221, in forward corr6 = self.corr(c16, c26) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(input, kwargs) File "/content/DAIN/PWCNet/correlation_package_pytorch1_0/correlation.py", line 59, in forward result = CorrelationFunction(self.pad_size, self.kernel_size, self.max_displacement,self.stride1, self.stride2, self.corr_multiply)(input1, input2) File "/content/DAIN/PWCNet/correlation_package_pytorch1_0/correlation.py", line 27, in forward self.pad_size, self.kernel_size, self.max_displacement,self.stride1, self.stride2, self.corr_multiply) RuntimeError: CUDA call failed (correlation_forward_cuda at correlation_cuda.cc:80) frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x33 (0x7fc469e26193 in /usr/local/lib/python3.6/dist-packages/torch/lib/libc10.so) frame #1: correlation_forward_cuda(at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, int, int, int, int, int, int) + 0x628 (0x7fc46625ab38 in /usr/local/lib/python3.6/dist-packages/correlation_cuda-0.0.0-py3.6-linux-x86_64.egg/correlation_cuda.cpython-36m-x86_64-linux-gnu.so) frame #2: + 0x1bd4a (0x7fc46626ad4a in /usr/local/lib/python3.6/dist-packages/correlation_cuda-0.0.0-py3.6-linux-x86_64.egg/correlation_cuda.cpython-36m-x86_64-linux-gnu.so) frame #3: + 0x18890 (0x7fc466267890 in /usr/local/lib/python3.6/dist-packages/correlation_cuda-0.0.0-py3.6-linux-x86_64.egg/correlation_cuda.cpython-36m-x86_64-linux-gnu.so) frame #4: python3() [0x50a7f5]

frame #7: python3() [0x594b01] frame #9: THPFunction_do_forward(THPFunction*, _object*) + 0x4ac (0x7fc4b2e37d4c in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_python.so) frame #11: python3() [0x54ac61] frame #13: python3() [0x50a783] frame #16: python3() [0x594b01] frame #19: python3() [0x507f24] frame #21: python3() [0x594b01] frame #22: python3() [0x54ac61] frame #24: python3() [0x50a783] frame #26: python3() [0x507f24] frame #28: python3() [0x594b01] frame #31: python3() [0x507f24] frame #33: python3() [0x594b01] frame #34: python3() [0x54ac61] frame #36: python3() [0x50a783] frame #38: python3() [0x507f24] frame #39: python3() [0x509c50] frame #40: python3() [0x50a64d] frame #42: python3() [0x507f24] frame #44: python3() [0x594b01] frame #47: python3() [0x507f24] frame #49: python3() [0x594b01] frame #50: python3() [0x54ac61] frame #52: python3() [0x50a783] frame #54: python3() [0x507f24] frame #56: python3() [0x634dd2] frame #61: __libc_start_main + 0xe7 (0x7fc4be047b97 in /lib/x86_64-linux-gnu/libc.so.6) please tell me how to deal with the error?
tianchengdw commented 4 years ago

I have the same problem.

AlphaGit commented 4 years ago

Hi there! You seem to be using an old version of the colab file. I believe that also the repository has changed minor things about the interpolation so if I was in your situation, I'd give it a try with the new version. You can find it here: https://github.com/baowenbo/DAIN/blob/master/Colab_DAIN.ipynb

AlexU225 commented 4 years ago

: https://github.com/baowenbo/DAIN/blob/master/Colab_DAIN.ipynb

Using this Colab, an error occurred in the fps detection block, and Google drive was successfully connected. sorry for my English, I'm using a translator Снимок

cp: cannot stat '/content/gdrive/My Drive//content/gdrive/My': No such file or directory cp: cannot stat 'Drive/Pexels': No such file or directory cp: cannot stat 'Videos': No such file or directory cp: cannot stat '2759484.mp4': No such file or directory


CalledProcessError Traceback (most recent call last)

in () 1 # Detecting FPS of input file. ----> 2 get_ipython().magic('shell yes | cp -f /content/gdrive/My\\ Drive/{INPUT_FILEPATH} /content/DAIN/') 3 4 import os 5 filename = os.path.basename(INPUT_FILEPATH) 3 frames /usr/local/lib/python3.6/dist-packages/google/colab/_system_commands.py in check_returncode(self) 136 if self.returncode: 137 raise subprocess.CalledProcessError( --> 138 returncode=self.returncode, cmd=self.args, output=self.output) 139 140 def _repr_pretty_(self, p, cycle): # pylint:disable=unused-argument CalledProcessError: Command 'yes | cp -f /content/gdrive/My\ Drive//content/gdrive/My Drive/Pexels Videos 2759484.mp4 /content/DAIN/' returned non-zero exit status 1.
AlphaGit commented 4 years ago

@AlexU225 Hi, the error is simply that it's not finding the file path. See the error you got:

cp: cannot stat '/content/gdrive/My Drive//content/gdrive/My': No such file or directory cp: cannot stat 'Drive/Pexels': No such file or directory cp: cannot stat 'Videos': No such file or directory cp: cannot stat '2759484.mp4': No such file or directory

So, in parameters, instead of /content/gdrive/My Drive/Pexels... you should use Pexels...

AlexU225 commented 3 years ago

@AlexU225 Hi, the error is simply that it's not finding the file path. See the error you got:

cp: cannot stat '/content/gdrive/My Drive//content/gdrive/My': No such file or directory cp: cannot stat 'Drive/Pexels': No such file or directory cp: cannot stat 'Videos': No such file or directory cp: cannot stat '2759484.mp4': No such file or directory

So, in parameters, instead of /content/gdrive/My Drive/Pexels... you should use Pexels...

Thank you for your advice! But now there is an error in this block

File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, **kwargs) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 345, in forward return self.conv2d_forward(input, self.weight) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 342, in conv2d_forward self.padding, self.dilation, self.groups) RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input.

AlphaGit commented 3 years ago

Hey @AlexU225 I'm glad you made it that far! Unfortunately, I cannot help you there. That seems like a problem with the image processing itself.

niuhuojian commented 3 years ago

I have same problem with Tesla V100-SXM2-16GB,but P100-PCIE-16GB is work. RuntimeError: CUDA call failed (correlation_forward_cuda at correlation_cuda.cc:80) I tried to use new version,but it still happened. please tell me how to deal with the error?

mpriessner commented 3 years ago

Hello, I have the same problem as @niuhuojian also with the Tesla V100-SXM2-16GB. Which I am using on Google Colab. Here I get the following error when running the notebook from: https://github.com/baowenbo/DAIN/blob/master/Colab_DAIN.ipynb

Capture

I tried already some things to fix it. using different combinations of Cuda, gcc and torch versions . (Cuda 9.0, gcc 6.5, torch 1.0.0/Cuda 9.0, gcc 6.5, torch 1.1.0/ Cuda 9.0, gcc 4.8, torch, pytorch 0.4.1 /Cuda 10.0 gcc 7.5, torch 1.4/Cuda 10.1 gcc 7.5, torch 1.6.) But non of them worked for me.

I also tried to use the solution from from CyFeng16 from issue#44 in but this also seems to stop working.

When I use Cuda 9.0 with gcc-4.8 g++-4.8 which used to work around 4 month ago. This one as well as some of the other combinations gave me the FilterInterpolation Module error from the my_packages folder. see below:

Traceback (most recent call last): File "train.py", line 15, in import networks File "/content/DAIN/networks/init.py", line 1, in from .DAIN import DAIN File "/content/DAIN/networks/DAIN.py", line 4, in from my_package.FilterInterpolation import FilterInterpolationModule File "/content/DAIN/my_package/FilterInterpolation/init.py", line 1, in from .FilterInterpolationModule import * File "/content/DAIN/my_package/FilterInterpolation/FilterInterpolationModule.py", line 6, in from .FilterInterpolationLayer import FilterInterpolationLayer,WeightLayer, PixelValueLayer,PixelWeightLayer,ReliableWeightLayer File "/content/DAIN/my_package/FilterInterpolation/FilterInterpolationLayer.py", line 4, in import filterinterpolation_cuda as my_lib ModuleNotFoundError: No module named 'filterinterpolation_cuda'

I am slowly running out of ideas to fix that. Does anyone have a working notebook, or an idea what else I could try to do? That would be great!

iBobbyTS commented 3 years ago

Hi there, I think that's caused by the building process of DAIN packages ModuleNotFoundError: No module named 'filterinterpolation_cuda' This means the "filterinterpolation_cuda" package is not installed. Did you run build.sh? Since you have V100 with the compute compatibility of 7.0, you should uncomment the line # '-gencode', 'arch=compute_70,code=sm_70', at DAIN/my_package/compiler_args.py. Then run the build.sh at my_package and PWCNet. AD. For eaiser installation and usage, refer to iBobbyTS/VFIN, this is kind of like a Video interpolation toolkit, of cause DAIN is in it. I have a colab notebook and you can store the whole built VFIN in drive, every time you only need to extract the files to Colab Runtime and you can start using it.

semel1 commented 3 years ago

I have same problem with Tesla V100-SXM2-16GB, RuntimeError: CUDA call failed (correlation_forward_cuda at correlation_cuda.cc:80). niuhuojian said that P100-PCIE-16GB works, unfortunately I can't specify which GPU should be used . The only reason to stack with this version becouse intrigued the ability to specify random output FPS (60fps)

iBobbyTS commented 3 years ago

I have same problem with Tesla V100-SXM2-16GB,

RuntimeError: CUDA call failed (correlation_forward_cuda at correlation_cuda.cc:80).

niuhuojian said that P100-PCIE-16GB works, unfortunately I can't specify which GPU should be used .

The only reason to stack with this version becouse intrigued the ability to specify random output FPS (60fps)

Did you try my suggestions a month ago?

semel1 commented 3 years ago

I uncommented '-gencode', 'arch=compute_70,code=sm_70' in the compiler_args.py as you suggested and switched to !pip install torch==1.0.0 torchvision==0.2.1 as TaoTeCha suggested in another post https://github.com/baowenbo/DAIN/issues/117#issuecomment-725081412 The colab is working with a V100 now

semel1 commented 3 years ago

Any chance to make Windows binary?

WilliamJudge94 commented 3 years ago

I uncommented '-gencode', 'arch=compute_70,code=sm_70' in the compiler_args.py as you suggested and switched to !pip install torch==1.0.0 torchvision==0.2.1 as TaoTeCha suggested in another post #117 (comment) The colab is working with a V100 now

This is in DAIN/my_package/compiler_args.py