CyFeng16 / MVIMP

Mixed Video and Image Manipulation Program
GNU General Public License v3.0
71 stars 20 forks source link

DAIN: "no kernel image is available for execution on the device" #33

Open Tetsuo7945 opened 3 years ago

Tetsuo7945 commented 3 years ago

When I proceed to try DAIN on a video file I receive the following:

(Btw, will or does your version of DAIN support Adaptive Record timestamps like the DAIN APP? Reference: https://imgur.com/a/7ihS2ir )

Current PyTorch version is 1.0.0 ffmpeg -hide_banner -loglevel warning -threads 4 -i /content/MVIMP/Data/Input/danny.mp4 /content/MVIMP/Data/Input/%8d.png The video-image extracting job is done.

--------------------SUMMARY-------------------- Current input video file is danny.mp4, danny.mp4's fps is 29.97, danny.mp4 has 6211 frames. Now we will process this video to 59.94 fps. Frame split method will not be used. --------------------NOW END--------------------

python3 -W ignore vfi_helper.py --src /content/MVIMP/Data/Input --dst /content/MVIMP/Data/Output --time_step 0.5
revise the unique id to a random numer 33628 Namespace(SAVED_MODEL='./model_weights/best.pth', alpha=[0.0, 1.0], arg='./model_weights/33628-Sun-Sep-06-02:23/args.txt', batch_size=1, channels=3, ctx_lr_coe=1.0, datasetName='Vimeo_90K_interp', datasetPath='', dataset_split=97, debug=False, depth_lr_coe=0.001, dst='/content/MVIMP/Data/Output', dtype=<class 'torch.cuda.FloatTensor'>, epsilon=1e-06, factor=0.2, filter_lr_coe=1.0, filter_size=4, flow_lr_coe=0.01, force=False, high_resolution=False, log='./model_weights/33628-Sun-Sep-06-02:23/log.txt', lr=0.002, netName='DAIN_slowmotion', no_date=False, numEpoch=100, occ_lr_coe=1.0, patience=5, rectify_lr=0.001, save_path='./model_weights/33628-Sun-Sep-06-02:23', save_which=1, seed=1, src='/content/MVIMP/Data/Input', time_step=0.5, uid=None, use_cuda=True, use_cudnn=1, weight_decay=0, workers=8) cudnn is used Interpolate 1 frames The model weight is: ./model_weights/best.pth ** current handling frame from /content/MVIMP/Data/Input. ** ** current time_step is 0.5 ** ** current output_dir is /content/MVIMP/Data/Output ** ** high resolution method not used. ** 0% 0/6210 [00:00<?, ?it/s]error in correlation_forward_cuda_kernel: no kernel image is available for execution on the device 0% 0/6210 [00:04<?, ?it/s] Traceback (most recent call last): File "vfi_helper.py", line 204, in input_dir=args.src, output_dir=args.dst, time_step=args.time_step, File "vfi_helper.py", line 45, in continue_frames_insertion_helper time_step=time_step, File "vfi_helper.py", line 77, in frames_insertion_helper y_0 = model_inference_helper(im_0, im_1) File "vfi_helper.py", line 150, in model_inference_helper ys, , _ = model(torch.stack((x_0, x_1), dim=0)) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, kwargs) File "/content/MVIMP/third_party/DAIN/networks/DAIN_slowmotion.py", line 170, in forward self.flownets, cur_offset_input, time_offsets=time_offsets File "/content/MVIMP/third_party/DAIN/networks/DAIN_slowmotion.py", line 268, in forward_flownets input File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, *kwargs) File "/content/MVIMP/third_party/DAIN/PWCNet/PWCNet.py", line 241, in forward corr6 = self.corr(c16, c26) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(input, kwargs) File "/content/MVIMP/third_party/DAIN/PWCNet/correlation_package_pytorch1_0/correlation.py", line 106, in forward )(input1, input2) File "/content/MVIMP/third_party/DAIN/PWCNet/correlation_package_pytorch1_0/correlation.py", line 45, in forward self.corr_multiply, RuntimeError: CUDA call failed (correlation_forward_cuda at correlation_cuda.cc:80) frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f429e43bfe1 in /usr/local/lib/python3.6/dist-packages/torch/lib/libc10.so) frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f429e43bdfa in /usr/local/lib/python3.6/dist-packages/torch/lib/libc10.so) frame #2: correlation_forward_cuda(at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, int, int, int, int, int, int) + 0x624 (0x7f429b008ba4 in /usr/local/lib/python3.6/dist-packages/correlation_cuda-0.0.0-py3.6-linux-x86_64.egg/correlation_cuda.cpython-36m-x86_64-linux-gnu.so) frame #3: + 0x1556a (0x7f429b01456a in /usr/local/lib/python3.6/dist-packages/correlation_cuda-0.0.0-py3.6-linux-x86_64.egg/correlation_cuda.cpython-36m-x86_64-linux-gnu.so) frame #4: + 0x12767 (0x7f429b011767 in /usr/local/lib/python3.6/dist-packages/correlation_cuda-0.0.0-py3.6-linux-x86_64.egg/correlation_cuda.cpython-36m-x86_64-linux-gnu.so) frame #5: python3() [0x50a7f5]

frame #8: python3() [0x594b01] frame #10: THPFunction_do_forward(THPFunction*, _object*) + 0x15c (0x7f42d876dbdc in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_python.so) frame #12: python3() [0x54ac61] frame #14: python3() [0x50a783] frame #17: python3() [0x594b01] frame #20: python3() [0x507f24] frame #22: python3() [0x594b01] frame #23: python3() [0x54ac61] frame #25: python3() [0x50a783] frame #27: python3() [0x507f24] frame #29: python3() [0x594b01] frame #32: python3() [0x507f24] frame #34: python3() [0x594b01] frame #35: python3() [0x54ac61] frame #37: python3() [0x50a783] frame #39: python3() [0x507f24] frame #40: python3() [0x509c50] frame #41: python3() [0x50a64d] frame #43: python3() [0x507f24] frame #45: python3() [0x594b01] frame #48: python3() [0x507f24] frame #50: python3() [0x594b01] frame #51: python3() [0x54ac61] frame #53: python3() [0x50a783] frame #55: python3() [0x507f24] frame #56: python3() [0x509c50] frame #57: python3() [0x50a64d] frame #59: python3() [0x507f24] frame #60: python3() [0x509c50] frame #61: python3() [0x50a64d] frame #63: python3() [0x507f24] ffmpeg -hide_banner -loglevel warning -threads 4 -r 59.94 -f image2 -i /content/MVIMP/Data/Input/%10d.png -y -c:v libx264 -preset slow -crf 8 /content/MVIMP/Data/Output/danny-59.94.mp4 [png @ 0x559727e3c800] Invalid PNG signature 0x1A45DFA301000000. Error while decoding stream #0:0: Invalid data found when processing input The image-video fusion job is done.
CyFeng16 commented 3 years ago

let me see if I could tell some possible reasons.

Tetsuo7945 notifications@github.com 于 2020年9月6日周日 10:32写道:

When I proceed to try DAIN on a video file I receive the following:

(Btw, will or does your version of DAIN support Adaptive Record timestamps like the DAIN APP? Reference: https://imgur.com/a/7ihS2ir )

Current PyTorch version is 1.0.0 ffmpeg -hide_banner -loglevel warning -threads 4 -i /content/MVIMP/Data/Input/danny.mp4 /content/MVIMP/Data/Input/%8d.png The video-image extracting job is done.

--------------------SUMMARY-------------------- Current input video file is danny.mp4, danny.mp4's fps is 29.97, danny.mp4 has 6211 frames. Now we will process this video to 59.94 fps. Frame split method will not be used. --------------------NOW END--------------------

python3 -W ignore vfi_helper.py --src /content/MVIMP/Data/Input --dst /content/MVIMP/Data/Output --time_step 0.5 revise the unique id to a random numer 33628 Namespace(SAVED_MODEL='./model_weights/best.pth', alpha=[0.0, 1.0], arg='./model_weights/33628-Sun-Sep-06-02:23/args.txt', batch_size=1, channels=3, ctx_lr_coe=1.0, datasetName='Vimeo_90K_interp', datasetPath='', dataset_split=97, debug=False, depth_lr_coe=0.001, dst='/content/MVIMP/Data/Output', dtype=<class 'torch.cuda.FloatTensor'>, epsilon=1e-06, factor=0.2, filter_lr_coe=1.0, filter_size=4, flow_lr_coe=0.01, force=False, high_resolution=False, log='./model_weights/33628-Sun-Sep-06-02:23/log.txt', lr=0.002, netName='DAIN_slowmotion', no_date=False, numEpoch=100, occ_lr_coe=1.0, patience=5, rectify_lr=0.001, save_path='./model_weights/33628-Sun-Sep-06-02:23', save_which=1, seed=1, src='/content/MVIMP/Data/Input', time_step=0.5, uid=None, use_cuda=True, use_cudnn=1, weight_decay=0, workers=8) cudnn is used Interpolate 1 frames The model weight is: ./model_weights/best.pth ** current handling frame from /content/MVIMP/Data/Input.


** current time_step is 0.5 ** ** current output_dir is /content/MVIMP/Data/Output


** high resolution method not used. ** 0% 0/6210 [00:00<?, ?it/s]error in correlation_forward_cuda_kernel: no kernel image is available for execution on the device 0% 0/6210 [00:04<?, ?it/s] Traceback (most recent call last): File "vfi_helper.py", line 204, in input_dir=args.src, output_dir=args.dst, time_step=args.time_step, File "vfi_helper.py", line 45, in continue_frames_insertion_helper time_step=time_step, File "vfi_helper.py", line 77, in frames_insertion_helper y_0 = model_inference_helper(im_0, im_1) File "vfi_helper.py", line 150, in model_inference_helper ys, , _ = model(torch.stack((x_0, x_1), dim=0)) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, *kwargs) File "/content/MVIMP/third_party/DAIN/networks/DAIN_slowmotion.py", line 170, in forward self.flownets, cur_offset_input, time_offsets=time_offsets File "/content/MVIMP/third_party/DAIN/networks/DAIN_slowmotion.py", line 268, in forward_flownets input File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(input, *kwargs) File "/content/MVIMP/third_party/DAIN/PWCNet/PWCNet.py", line 241, in forward corr6 = self.corr(c16, c26) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(input,

kwargs) File "/content/MVIMP/third_party/DAIN/PWCNet/correlation_package_pytorch1_0/correlation.py", line 106, in forward )(input1, input2) File "/content/MVIMP/third_party/DAIN/PWCNet/correlation_package_pytorch1_0/correlation.py", line 45, in forward self.corr_multiply, RuntimeError: CUDA call failed (correlation_forward_cuda at correlation_cuda.cc:80) frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f429e43bfe1 in /usr/local/lib/python3.6/dist-packages/torch/lib/libc10.so) frame #1 https://github.com/CyFeng16/MVIMP/issues/1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f429e43bdfa in /usr/local/lib/python3.6/dist-packages/torch/lib/libc10.so) frame #2 https://github.com/CyFeng16/MVIMP/issues/2: correlation_forward_cuda(at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, int, int, int, int, int, int) + 0x624 (0x7f429b008ba4 in /usr/local/lib/python3.6/dist-packages/correlation_cuda-0.0.0-py3.6-linux-x86_64.egg/correlation_cuda.cpython-36m-x86_64-linux-gnu.so http://correlation_cuda.cpython-36m-x86_64-linux-gnu.so) frame #3 https://github.com/CyFeng16/MVIMP/pull/3: + 0x1556a (0x7f429b01456a in /usr/local/lib/python3.6/dist-packages/correlation_cuda-0.0.0-py3.6-linux-x86_64.egg/correlation_cuda.cpython-36m-x86_64-linux-gnu.so http://correlation_cuda.cpython-36m-x86_64-linux-gnu.so) frame #4 https://github.com/CyFeng16/MVIMP/issues/4: + 0x12767 (0x7f429b011767 in /usr/local/lib/python3.6/dist-packages/correlation_cuda-0.0.0-py3.6-linux-x86_64.egg/correlation_cuda.cpython-36m-x86_64-linux-gnu.so http://correlation_cuda.cpython-36m-x86_64-linux-gnu.so) frame #5 https://github.com/CyFeng16/MVIMP/issues/5: python3() [0x50a7f5] frame #8 https://github.com/CyFeng16/MVIMP/pull/8: python3() [0x594b01] frame #10 https://github.com/CyFeng16/MVIMP/pull/10: THPFunction_do_forward(THPFunction, _object) + 0x15c (0x7f42d876dbdc in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_python.so) frame #12 https://github.com/CyFeng16/MVIMP/pull/12: python3() [0x54ac61] frame #14 https://github.com/CyFeng16/MVIMP/pull/14: python3() [0x50a783] frame #17 https://github.com/CyFeng16/MVIMP/pull/17: python3() [0x594b01] frame #20 https://github.com/CyFeng16/MVIMP/pull/20: python3() [0x507f24] frame #22 https://github.com/CyFeng16/MVIMP/issues/22: python3() [0x594b01] frame #23 https://github.com/CyFeng16/MVIMP/pull/23: python3() [0x54ac61] frame #25 https://github.com/CyFeng16/MVIMP/pull/25: python3() [0x50a783] frame #27 https://github.com/CyFeng16/MVIMP/issues/27: python3() [0x507f24] frame #29 https://github.com/CyFeng16/MVIMP/issues/29: python3() [0x594b01] frame #32 https://github.com/CyFeng16/MVIMP/pull/32: python3() [0x507f24] frame #34: python3() [0x594b01] frame #35: python3() [0x54ac61] frame #37: python3() [0x50a783] frame #39: python3() [0x507f24] frame #40: python3() [0x509c50] frame #41: python3() [0x50a64d] frame #43: python3() [0x507f24] frame #45: python3() [0x594b01] frame #48: python3() [0x507f24] frame #50: python3() [0x594b01] frame #51: python3() [0x54ac61] frame #53: python3() [0x50a783] frame #55: python3() [0x507f24] frame #56: python3() [0x509c50] frame #57: python3() [0x50a64d] frame #59: python3() [0x507f24] frame #60: python3() [0x509c50] frame #61: python3() [0x50a64d] frame #63: python3() [0x507f24]

ffmpeg -hide_banner -loglevel warning -threads 4 -r 59.94 -f image2 -i /content/MVIMP/Data/Input/%10d.png -y -c:v libx264 -preset slow -crf 8 /content/MVIMP/Data/Output/danny-59.94.mp4 [png @ 0x559727e3c800] Invalid PNG signature 0x1A45DFA301000000. Error while decoding stream #0:0: Invalid data found when processing input The image-video fusion job is done.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/CyFeng16/MVIMP/issues/33, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHDIKHN7OJ5QRLCANLXXCV3SELYFBANCNFSM4Q3W7TOQ .

CyFeng16 commented 3 years ago

@Tetsuo7945

At first, I suspected that it was a compatibility problem with some code segments of DAIN, until I found this link, which clearly indicated the incompatibility of PyTorch on certain GPU cards.

Solution: Refresh your Colab and use the nvidia-smi command to verify that you get an enough GPU card (T4/P100/V100 or something like that), which solve this problem as well as give you speedup :)

CyFeng16 commented 3 years ago

Close the issue after you fixed it.

Tetsuo7945 commented 3 years ago

@Tetsuo7945

At first, I suspected that it was a compatibility problem with some code segments of DAIN, until I found this link, which clearly indicated the incompatibility of PyTorch on certain GPU cards.

Solution: Refresh your Colab and use the nvidia-smi command to verify that you get an enough GPU card (T4/P100/V100 or something like that), which solve this problem as well as give you speedup :)

I've just tested again and confirmed the allocation of a V100 and received the same error:


Current PyTorch version is 1.0.0
ffmpeg -hide_banner -loglevel warning -threads 4 -i /content/MVIMP/Data/Input/test.mkv /content/MVIMP/Data/Input/%8d.png
The video-image extracting job is done.

--------------------SUMMARY--------------------
Current input video file is test.mkv,
test.mkv's fps is 29.97,
test.mkv has 6211 frames.
Now we will process this video to 59.94 fps.
Frame split method will not be used.
--------------------NOW END--------------------

python3 -W ignore vfi_helper.py --src /content/MVIMP/Data/Input --dst /content/MVIMP/Data/Output --time_step 0.5  
revise the unique id to a random numer 45031
Namespace(SAVED_MODEL='./model_weights/best.pth', alpha=[0.0, 1.0], arg='./model_weights/45031-Mon-Sep-07-01:52/args.txt', batch_size=1, channels=3, ctx_lr_coe=1.0, datasetName='Vimeo_90K_interp', datasetPath='', dataset_split=97, debug=False, depth_lr_coe=0.001, dst='/content/MVIMP/Data/Output', dtype=<class 'torch.cuda.FloatTensor'>, epsilon=1e-06, factor=0.2, filter_lr_coe=1.0, filter_size=4, flow_lr_coe=0.01, force=False, high_resolution=False, log='./model_weights/45031-Mon-Sep-07-01:52/log.txt', lr=0.002, netName='DAIN_slowmotion', no_date=False, numEpoch=100, occ_lr_coe=1.0, patience=5, rectify_lr=0.001, save_path='./model_weights/45031-Mon-Sep-07-01:52', save_which=1, seed=1, src='/content/MVIMP/Data/Input', time_step=0.5, uid=None, use_cuda=True, use_cudnn=1, weight_decay=0, workers=8)
cudnn is used
Interpolate 1 frames
The model weight is: ./model_weights/best.pth
************** current handling frame from /content/MVIMP/Data/Input. **************
************** current time_step is 0.5 **************
************** current output_dir is /content/MVIMP/Data/Output **************
************** high resolution method not used. **************
  0% 0/6210 [00:00<?, ?it/s]error in correlation_forward_cuda_kernel: no kernel image is available for execution on the device
  0% 0/6210 [00:04<?, ?it/s]
Traceback (most recent call last):
  File "vfi_helper.py", line 204, in <module>
    input_dir=args.src, output_dir=args.dst, time_step=args.time_step,
  File "vfi_helper.py", line 45, in continue_frames_insertion_helper
    time_step=time_step,
  File "vfi_helper.py", line 77, in frames_insertion_helper
    y_0 = model_inference_helper(im_0, im_1)
  File "vfi_helper.py", line 150, in model_inference_helper
    y_s, _, _ = model(torch.stack((x_0, x_1), dim=0))
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/content/MVIMP/third_party/DAIN/networks/DAIN_slowmotion.py", line 170, in forward
    self.flownets, cur_offset_input, time_offsets=time_offsets
  File "/content/MVIMP/third_party/DAIN/networks/DAIN_slowmotion.py", line 268, in forward_flownets
    input
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/content/MVIMP/third_party/DAIN/PWCNet/PWCNet.py", line 241, in forward
    corr6 = self.corr(c16, c26)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/content/MVIMP/third_party/DAIN/PWCNet/correlation_package_pytorch1_0/correlation.py", line 106, in forward
    )(input1, input2)
  File "/content/MVIMP/third_party/DAIN/PWCNet/correlation_package_pytorch1_0/correlation.py", line 45, in forward
    self.corr_multiply,
RuntimeError: CUDA call failed (correlation_forward_cuda at correlation_cuda.cc:80)
frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f7453bd4fe1 in /usr/local/lib/python3.6/dist-packages/torch/lib/libc10.so)
frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f7453bd4dfa in /usr/local/lib/python3.6/dist-packages/torch/lib/libc10.so)
frame #2: correlation_forward_cuda(at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, at::Tensor&, int, int, int, int, int, int) + 0x624 (0x7f74505a1ba4 in /usr/local/lib/python3.6/dist-packages/correlation_cuda-0.0.0-py3.6-linux-x86_64.egg/correlation_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #3: <unknown function> + 0x1556a (0x7f74505ad56a in /usr/local/lib/python3.6/dist-packages/correlation_cuda-0.0.0-py3.6-linux-x86_64.egg/correlation_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #4: <unknown function> + 0x12767 (0x7f74505aa767 in /usr/local/lib/python3.6/dist-packages/correlation_cuda-0.0.0-py3.6-linux-x86_64.egg/correlation_cuda.cpython-36m-x86_64-linux-gnu.so)
frame #5: python3() [0x50a7f5]
<omitting python frames>
frame #8: python3() [0x594b01]
frame #10: THPFunction_do_forward(THPFunction*, _object*) + 0x15c (0x7f748df06bdc in /usr/local/lib/python3.6/dist-packages/torch/lib/libtorch_python.so)
frame #12: python3() [0x54ac61]
frame #14: python3() [0x50a783]
frame #17: python3() [0x594b01]
frame #20: python3() [0x507f24]
frame #22: python3() [0x594b01]
frame #23: python3() [0x54ac61]
frame #25: python3() [0x50a783]
frame #27: python3() [0x507f24]
frame #29: python3() [0x594b01]
frame #32: python3() [0x507f24]
frame #34: python3() [0x594b01]
frame #35: python3() [0x54ac61]
frame #37: python3() [0x50a783]
frame #39: python3() [0x507f24]
frame #40: python3() [0x509c50]
frame #41: python3() [0x50a64d]
frame #43: python3() [0x507f24]
frame #45: python3() [0x594b01]
frame #48: python3() [0x507f24]
frame #50: python3() [0x594b01]
frame #51: python3() [0x54ac61]
frame #53: python3() [0x50a783]
frame #55: python3() [0x507f24]
frame #56: python3() [0x509c50]
frame #57: python3() [0x50a64d]
frame #59: python3() [0x507f24]
frame #60: python3() [0x509c50]
frame #61: python3() [0x50a64d]
frame #63: python3() [0x507f24]

ffmpeg -hide_banner -loglevel warning -threads 4 -r 59.94 -f image2 -i /content/MVIMP/Data/Input/%10d.png -y -c:v libx264 -preset slow -crf 8 /content/MVIMP/Data/Output/test-59.94.mkv
The image-video fusion job is done.
CyFeng16 commented 3 years ago

@Tetsuo7945 According to this issue, the same issue indeed occurred with V100. I will run it by myself and let's see.

UPDATE: The same issue occurred when I tried. I will keep watching.

redna11 commented 3 years ago

for issues related to : correlation_forward_cuda , I managed to make it work by setting:

by setting nvcc '-gencode', 'arch=compute_70,code=compute_70' GTX 2080Ti python 3.7 torch 1.4 cuda 10.0

reference: https://github.com/baowenbo/DAIN/issues/44

Hope it helps