amessina71 commented 1 year ago

Hi all, thank you for this great piece of work. I'm trying to make it run on a dual Ampére system, CUDA 11.6. Just cloned the repo and followed the install instructions. No problems or errors during installation. When I launch the demo_narrator.py script I get the following error, as if at first the key remapping procedure got wrong for some reason and then I get a Segmentation fault. thank you in advance for your help Alberto

`python demo_narrator.py --cuda --video-path some.mp4 /usr/local/lib/python3.9/site-packages/torchvision/transforms/_functional_video.py:5: UserWarning: The _functional_video module is deprecated. Please use the functional module instead. warnings.warn( /usr/local/lib/python3.9/site-packages/torchvision/transforms/_transforms_video.py:25: UserWarning: The _transforms_video module is deprecated. Please use the transforms module instead. warnings.warn( downloading model to modelzoo/vclm_openai_timesformer_large_336px_gpt2_xl.pt_ego4d.jobid_246897.ep_0003.md5sum_443263.pth

USING ATTENTION STYLE: frozen-in-time

100%|███████████████████████████████████████| 891M/891M [01:48<00:00, 8.58MiB/s] => Loading CLIP (ViT-L/14@336px) weights _IncompatibleKeys(missing_keys=['temporal_embed', 'blocks.0.timeattn.qkv.weight', 'blocks.0.timeattn.qkv.bias', 'blocks.0.timeattn.proj.weight', 'blocks.0.timeattn.proj.bias', 'blocks.0.norm3.weight', 'blocks.0.norm3.bias', 'blocks.1.timeattn.qkv.weight', 'blocks.1.timeattn.qkv.bias', 'blocks.1.timeattn.proj.weight', 'blocks.1.timeattn.proj.bias', 'blocks.1.norm3.weight', 'blocks.1.norm3.bias', 'blocks.2.timeattn.qkv.weight', 'blocks.2.timeattn.qkv.bias', 'blocks.2.timeattn.proj.weight', 'blocks.2.timeattn.proj.bias', 'blocks.2.norm3.weight', 'blocks.2.norm3.bias', 'blocks.3.timeattn.qkv.weight', 'blocks.3.timeattn.qkv.bias', 'blocks.3.timeattn.proj.weight', 'blocks.3.timeattn.proj.bias', 'blocks.3.norm3.weight', 'blocks.3.norm3.bias', 'blocks.4.timeattn.qkv.weight', 'blocks.4.timeattn.qkv.bias', 'blocks.4.timeattn.proj.weight', 'blocks.4.timeattn.proj.bias', 'blocks.4.norm3.weight', 'blocks.4.norm3.bias', 'blocks.5.timeattn.qkv.weight', 'blocks.5.timeattn.qkv.bias', 'blocks.5.timeattn.proj.weight', 'blocks.5.timeattn.proj.bias', 'blocks.5.norm3.weight', 'blocks.5.norm3.bias', 'blocks.6.timeattn.qkv.weight', 'blocks.6.timeattn.qkv.bias', 'blocks.6.timeattn.proj.weight', 'blocks.6.timeattn.proj.bias', 'blocks.6.norm3.weight', 'blocks.6.norm3.bias', 'blocks.7.timeattn.qkv.weight', 'blocks.7.timeattn.qkv.bias', 'blocks.7.timeattn.proj.weight', 'blocks.7.timeattn.proj.bias', 'blocks.7.norm3.weight', 'blocks.7.norm3.bias', 'blocks.8.timeattn.qkv.weight', 'blocks.8.timeattn.qkv.bias', 'blocks.8.timeattn.proj.weight', 'blocks.8.timeattn.proj.bias', 'blocks.8.norm3.weight', 'blocks.8.norm3.bias', 'blocks.9.timeattn.qkv.weight', 'blocks.9.timeattn.qkv.bias', 'blocks.9.timeattn.proj.weight', 'blocks.9.timeattn.proj.bias', 'blocks.9.norm3.weight', 'blocks.9.norm3.bias', 'blocks.10.timeattn.qkv.weight', 'blocks.10.timeattn.qkv.bias', 'blocks.10.timeattn.proj.weight', 'blocks.10.timeattn.proj.bias', 'blocks.10.norm3.weight', 'blocks.10.norm3.bias', 'blocks.11.timeattn.qkv.weight', 'blocks.11.timeattn.qkv.bias', 'blocks.11.timeattn.proj.weight', 'blocks.11.timeattn.proj.bias', 'blocks.11.norm3.weight', 'blocks.11.norm3.bias', 'blocks.12.timeattn.qkv.weight', 'blocks.12.timeattn.qkv.bias', 'blocks.12.timeattn.proj.weight', 'blocks.12.timeattn.proj.bias', 'blocks.12.norm3.weight', 'blocks.12.norm3.bias', 'blocks.13.timeattn.qkv.weight', 'blocks.13.timeattn.qkv.bias', 'blocks.13.timeattn.proj.weight', 'blocks.13.timeattn.proj.bias', 'blocks.13.norm3.weight', 'blocks.13.norm3.bias', 'blocks.14.timeattn.qkv.weight', 'blocks.14.timeattn.qkv.bias', 'blocks.14.timeattn.proj.weight', 'blocks.14.timeattn.proj.bias', 'blocks.14.norm3.weight', 'blocks.14.norm3.bias', 'blocks.15.timeattn.qkv.weight', 'blocks.15.timeattn.qkv.bias', 'blocks.15.timeattn.proj.weight', 'blocks.15.timeattn.proj.bias', 'blocks.15.norm3.weight', 'blocks.15.norm3.bias', 'blocks.16.timeattn.qkv.weight', 'blocks.16.timeattn.qkv.bias', 'blocks.16.timeattn.proj.weight', 'blocks.16.timeattn.proj.bias', 'blocks.16.norm3.weight', 'blocks.16.norm3.bias', 'blocks.17.timeattn.qkv.weight', 'blocks.17.timeattn.qkv.bias', 'blocks.17.timeattn.proj.weight', 'blocks.17.timeattn.proj.bias', 'blocks.17.norm3.weight', 'blocks.17.norm3.bias', 'blocks.18.timeattn.qkv.weight', 'blocks.18.timeattn.qkv.bias', 'blocks.18.timeattn.proj.weight', 'blocks.18.timeattn.proj.bias', 'blocks.18.norm3.weight', 'blocks.18.norm3.bias', 'blocks.19.timeattn.qkv.weight', 'blocks.19.timeattn.qkv.bias', 'blocks.19.timeattn.proj.weight', 'blocks.19.timeattn.proj.bias', 'blocks.19.norm3.weight', 'blocks.19.norm3.bias', 'blocks.20.timeattn.qkv.weight', 'blocks.20.timeattn.qkv.bias', 'blocks.20.timeattn.proj.weight', 'blocks.20.timeattn.proj.bias', 'blocks.20.norm3.weight', 'blocks.20.norm3.bias', 'blocks.21.timeattn.qkv.weight', 'blocks.21.timeattn.qkv.bias', 'blocks.21.timeattn.proj.weight', 'blocks.21.timeattn.proj.bias', 'blocks.21.norm3.weight', 'blocks.21.norm3.bias', 'blocks.22.timeattn.qkv.weight', 'blocks.22.timeattn.qkv.bias', 'blocks.22.timeattn.proj.weight', 'blocks.22.timeattn.proj.bias', 'blocks.22.norm3.weight', 'blocks.22.norm3.bias', 'blocks.23.timeattn.qkv.weight', 'blocks.23.timeattn.qkv.bias', 'blocks.23.timeattn.proj.weight', 'blocks.23.timeattn.proj.bias', 'blocks.23.norm3.weight', 'blocks.23.norm3.bias', 'head.weight', 'head.bias'], unexpected_keys=[]) Downloading config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 689/689 [00:00<00:00, 289kB/s] Downloading pytorch_model.bin: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 5.99G/5.99G [12:13<00:00, 8.77MB/s] Segmentation fault (core dumped)`

zhaoyue-zephyrus commented 1 year ago

Hi @amessina71 ,

The warning log with _IncompatibleKeys() is expected since we need to first load CLIP-pretrained weights for the spatial part of the TimeSformer (See Table 10 and Appendix F of our tech report) when constructing the model. Since we will load the entire weights later, you can safely ignore it.

The Segmentation fault is another error which I guess is because the video cannot be loaded by decord. Could you try the example video first and see if the demo generates something? I can add a more generic video loader and will let you know once it's done.

amessina71 commented 1 year ago

Hi, thank you for your reply. I've tested the file under the asset folder named assets/3c0dffd0-e38e-4643-bc48-d513943dc20b_012_014.mp4 same error many thanks for your support A.

zhaoyue-zephyrus commented 1 year ago

Then I don't think it's the issue from the video loading. Can you check your local environment by running e.g. https://github.com/pytorch/pytorch/blob/master/torch/utils/collect_env.py ?

amessina71 commented 1 year ago

Hi, thank you for following this up. Here is the output of the collect_env script. I suspect there might be some sort of incompatibillity between the GPU architecture and the torch version ... I will try to upgrade torch and update the thread. thank you for your support A.

`root@59beef1d9912:/app# python3 collect_env.py Collecting environment information... /usr/local/lib/python3.9/site-packages/torch/cuda/init.py:143: UserWarning: NVIDIA A100-PCIE-40GB with CUDA capability sm_80 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70. If you want to use the NVIDIA A100-PCIE-40GB GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name)) PyTorch version: 1.10.1+cu102 Is debug build: False CUDA used to build PyTorch: 10.2 ROCM used to build PyTorch: N/A OS: Ubuntu 20.04.4 LTS (x86_64) GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 Clang version: Could not collect CMake version: Could not collect Libc version: glibc-2.31 Python version: 3.9.10 (main, May 30 2022, 01:30:39) [GCC 9.4.0] (64-bit runtime) Python platform: Linux-5.15.0-56-generic-x86_64-with-glibc2.31 Is CUDA available: True CUDA runtime version: Could not collect CUDA_MODULE_LOADING set to: GPU models and configuration: GPU 0: NVIDIA A100-PCIE-40GB GPU 1: NVIDIA A100-PCIE-40GB

Nvidia driver version: 510.108.03 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True Versions of relevant libraries: [pip3] numpy==1.23.5 [pip3] pytorchvideo==0.1.5 [pip3] torch==1.10.1 [pip3] torchvision==0.11.2 [conda] Could not collect root@59beef1d9912:/app#`

amessina71 commented 1 year ago

Hello. I've upgraded pytorch to 1.11.0+cu113 and now I do not see that mismatch any more in the environment. However i still get the segmentation fault (even with the test video). here follows the new environment set. Down below, a screenshot of the gdb bt command after that SIGSEGV happened during a gdb session. best regards A.

`root@59beef1d9912:/app# python3 collect_env.py Collecting environment information... PyTorch version: 1.11.0+cu113 Is debug build: False CUDA used to build PyTorch: 11.3 ROCM used to build PyTorch: N/A OS: Ubuntu 20.04.4 LTS (x86_64) GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 Clang version: Could not collect CMake version: Could not collect Libc version: glibc-2.31 Python version: 3.9.10 (main, May 30 2022, 01:30:39) [GCC 9.4.0] (64-bit runtime) Python platform: Linux-5.15.0-56-generic-x86_64-with-glibc2.31 Is CUDA available: True CUDA runtime version: Could not collect CUDA_MODULE_LOADING set to: GPU models and configuration: GPU 0: NVIDIA A100-PCIE-40GB GPU 1: NVIDIA A100-PCIE-40GB

Nvidia driver version: 510.108.03 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True Versions of relevant libraries: [pip3] numpy==1.23.5 [pip3] pytorchvideo==0.1.5 [pip3] torch==1.11.0+cu113 [pip3] torchaudio==0.11.0+cu113 [pip3] torchvision==0.12.0+cu113 [conda] Could not collect`

`#0 0x00000000000682aa in ?? ()

1 0x00007f1e96ad35ab in c10::detail::getNonDeterministicRandom(bool) () from /usr/local/lib/python3.9/site-packages/torch/lib/libc10.so

2 0x00007f1ee9af3483 in at::CUDAGeneratorImpl::seed() () from /usr/local/lib/python3.9/site-packages/torch/lib/libtorch_cuda_cpp.so

3 0x00007f1ee9af3b40 in std::call_once<at::cuda::detail::getDefaultCUDAGenerator(signed char)::{lambda()#1}>(std::once_flag&, at::cuda::detail::getDefaultCUDAGenerator(signed char)::{lambda()#1}&&)::{lambda()#2}::_FUN() () from /usr/local/lib/python3.9/site-packages/torch/lib/libtorch_cuda_cpp.so

4 0x00007f1fcbad44df in pthread_once_slow (once_control=0x55e39ddb69b0, init_routine=0x7f1f4f246c20 <once_proxy>) at pthread_once.c:116

5 0x00007f1ee9af208f in at::cuda::detail::getDefaultCUDAGenerator(signed char) () from /usr/local/lib/python3.9/site-packages/torch/lib/libtorch_cuda_cpp.so

6 0x00007f1f4c3b2398 in THCPModule_initExtension(_object, _object) () from /usr/local/lib/python3.9/site-packages/torch/lib/libtorch_python.so

7 0x00007f1fcbe5cd84 in cfunction_vectorcall_NOARGS (func=<built-in method _cuda_init of module object at remote 0x7f1f4d3156d0>, args=,

nargsf=<optimized out>, kwnames=<optimized out>) at Objects/methodobject.c:489

8 0x00007f1fcbe9af26 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=, args=0x55e3d43e8028,

callable=<built-in method _cuda_init of module object at remote 0x7f1f4d3156d0>, tstate=0x55e39c5ea080) at ./Include/cpython/abstract.h:118

9 PyObject_Vectorcall (kwnames=0x0, nargsf=, args=0x55e3d43e8028, callable=) at ./Include/cpython/abstract.h:127

10 call_function (kwnames=0x0, oparg=, pp_stack=, tstate=0x55e39c5ea080) at Python/ceval.c:5077

11 _PyEval_EvalFrameDefault (tstate=, f=, throwflag=) at Python/ceval.c:3489

12 0x00007f1fcbe35e13 in _PyEval_EvalFrame (throwflag=0,

f=Frame 0x55e3d43e7e80, for file /usr/local/lib/python3.9/site-packages/torch/cuda/__init__.py, line 216, in _lazy_init (), tstate=0x55e39c5ea080)
at ./Include/internal/pycore_ceval.h:40

13 function_code_fastcall (globals=, nargs=0, args=, co=, tstate=0x55e39c5ea080) at Objects/call.c:330

14 _PyFunction_Vectorcall (func=, stack=0x0, nargsf=, kwnames=) at Objects/call.c:367`

amessina71 commented 1 year ago

Hi guys, any news on this thread? It would be great if I could test the tool ... Thank you in advance for your support

gabrielegoletto commented 1 year ago

Hi @amessina71,

I was facing the same issue, it seems like inverting the import of decord and torch (torch first and then decord) solved it.

Hope it helps you as well :-)

amessina71 commented 1 year ago

Hi @amessina71,

I was facing the same issue, it seems like inverting the import of decord and torch (torch first and then decord) solved it.

Hope it helps you as well :-)

Yes it worked for me too! Thanks for the hint. Alberto

zhaoyue-zephyrus commented 1 year ago

Hi @amessina71,

I was facing the same issue, it seems like inverting the import of decord and torch (torch first and then decord) solved it.

Hope it helps you as well :-)

Thank you @ezius07 for spotting this issue! I will add a patch shortly to fix this.

Best, Yue

facebookresearch / LaViLa

Segmentation fault when launching demo_narrator [was: Keys remapping seems not to work] #4

USING ATTENTION STYLE: frozen-in-time

1 0x00007f1e96ad35ab in c10::detail::getNonDeterministicRandom(bool) () from /usr/local/lib/python3.9/site-packages/torch/lib/libc10.so

2 0x00007f1ee9af3483 in at::CUDAGeneratorImpl::seed() () from /usr/local/lib/python3.9/site-packages/torch/lib/libtorch_cuda_cpp.so

3 0x00007f1ee9af3b40 in std::call_once<at::cuda::detail::getDefaultCUDAGenerator(signed char)::{lambda()#1}>(std::once_flag&, at::cuda::detail::getDefaultCUDAGenerator(signed char)::{lambda()#1}&&)::{lambda()#2}::_FUN() () from /usr/local/lib/python3.9/site-packages/torch/lib/libtorch_cuda_cpp.so

4 0x00007f1fcbad44df in pthread_once_slow (once_control=0x55e39ddb69b0, init_routine=0x7f1f4f246c20 <once_proxy>) at pthread_once.c:116

5 0x00007f1ee9af208f in at::cuda::detail::getDefaultCUDAGenerator(signed char) () from /usr/local/lib/python3.9/site-packages/torch/lib/libtorch_cuda_cpp.so

6 0x00007f1f4c3b2398 in THCPModule_initExtension(_object, _object) () from /usr/local/lib/python3.9/site-packages/torch/lib/libtorch_python.so

7 0x00007f1fcbe5cd84 in cfunction_vectorcall_NOARGS (func=<built-in method _cuda_init of module object at remote 0x7f1f4d3156d0>, args=,

8 0x00007f1fcbe9af26 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=, args=0x55e3d43e8028,

9 PyObject_Vectorcall (kwnames=0x0, nargsf=, args=0x55e3d43e8028, callable=) at ./Include/cpython/abstract.h:127

10 call_function (kwnames=0x0, oparg=, pp_stack=, tstate=0x55e39c5ea080) at Python/ceval.c:5077

11 _PyEval_EvalFrameDefault (tstate=, f=, throwflag=) at Python/ceval.c:3489

12 0x00007f1fcbe35e13 in _PyEval_EvalFrame (throwflag=0,

13 function_code_fastcall (globals=, nargs=0, args=, co=, tstate=0x55e39c5ea080) at Objects/call.c:330

14 _PyFunction_Vectorcall (func=, stack=0x0, nargsf=, kwnames=) at Objects/call.c:367`