gpu error - Githubissues

likezjuisee commented 2 years ago

from fastvqa import deep_end_to_end_vqa

import torch dum_video = torch.randn((3,240,720,1080)) model_type="fast" vqa = deep_end_to_end_vqa(True, model_type=model_type) [True, True, True, False] /home/saman/miniconda3/envs/fast_vqa/lib/python3.8/site-packages/torch/functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2895.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] Successfully loaded pretrained=[True] fast-vqa model from pretrained_path=[pretrained_weights/fast_vqa_v0_3.pth]. Please make sure the input is [torch.tensor] in [(C,T,H,W)] layout and with data range [0,1]. vqa = deep_end_to_end_vqa(True, model_type=model_type, device="cuda:1") [True, True, True, False] /home/saman/miniconda3/envs/fast_vqa/lib/python3.8/site-packages/torch/cuda/init.py:146: UserWarning: NVIDIA GeForce RTX 3060 with CUDA capability sm_86 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70. If you want to use the NVIDIA GeForce RTX 3060 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name)) Successfully loaded pretrained=[True] fast-vqa model from pretrained_path=[pretrained_weights/fast_vqa_v0_3.pth]. Please make sure the input is [torch.tensor] in [(C,T,H,W)] layout and with data range [0,1].

vqa(dum_video) Traceback (most recent call last): File "", line 1, in File "/home/saman/Projects/FAST-VQA/fastvqa/apis/fast_vqa_model.py", line 77, in call x = ((x.permute(1, 2, 3, 0) - self.mean) / self.std).permute(3, 0, 1, 2) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cpu!

teowu commented 2 years ago

You may to refer to https://discuss.pytorch.org/t/trouble-with-cuda-capability-sm-86/152974 to solve this problem.

teowu commented 2 years ago

See if conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch works for you.

teowu commented 2 years ago

Or pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113.

likezjuisee commented 2 years ago

![Uploading image.png…]() My cudatoolkit version is 11.7, is that too high? Which version do you use?

+-----------------------------------------------------------------------------+ | NVIDIA-SMI 515.57 Driver Version: 515.57 CUDA Version: 11.7 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A | | 0% 43C P0 43W / 170W | 0MiB / 12288MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 NVIDIA GeForce ... Off | 00000000:06:00.0 Off | N/A | | 0% 42C P0 43W / 170W | 0MiB / 12288MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

likezjuisee commented 2 years ago

If I want to load a mp4 video file, and inference by your models, how to do. I am not familiar with the Pytorch, could you show me the code?

teowu commented 2 years ago

You might need to re-install pytorch based on your CUDA version, but based on my knowledge, the CUDA11.7 does not have corresponding Pytorch right now. You may try with Pytorch built for CUDA11.6. For the one-step inference code, we are working on revising it, and will soon release the new version on it.

likezjuisee commented 2 years ago

look forward for your "one-step inference code"!

teowu commented 2 years ago

look forward for your "one-step inference code"! This is done. Please clone the newest version of the dev branch and run python vqa.py to use.

likezjuisee commented 2 years ago

coredump

from fastvqa.models import DiViDeAddEvaluator device = "cuda" DiViDeAddEvaluator(opt["model"]["args"]).to("cuda") Traceback (most recent call last): File "", line 1, in TypeError: string indices must be integers config './options/fast/f3dvqa-b.yml' f = open(config, "r") opt = yaml.safe_load(f) opt {'name': 'Space_Time_Unified_FAST(3D)_11', 'num_epochs': 30, 'l_num_epochs': 0, 'warmup_epochs': 2.5, 'ema': True, 'save_model': True, 'batch_size': 16, 'num_workers': 6, 'wandb': {'project_name': 'VQA_Experiments_2022'}, 'data': {'train': {'type': 'FusionDataset', 'args': {'phase': 'train', 'anno_file': './examplar_data_labels/train_labels.txt', 'data_prefix': '../datasets/LSVQ', 'sample_types': {'fragments': {'fragments_h': 7, 'fragments_w': 7, 'fsize_h': 32, 'fsize_w': 32, 'aligned': 4}}, 'clip_len': 32, 'frame_interval': 2, 't_frag': 8, 'num_clips': 1}}, 'val-livevqc': {'type': 'FusionDataset', 'args': {'phase': 'test', 'anno_file': './examplar_data_labels/LIVE_VQC/labels.txt', 'data_prefix': '../datasets/LIVE_VQC/', 'sample_types': {'fragments': {'fragments_h': 7, 'fragments_w': 7, 'fsize_h': 32, 'fsize_w': 32, 'aligned': 4}}, 'clip_len': 32, 'frame_interval': 2, 't_frag': 8, 'num_clips': 1}}, 'val-kv1k': {'type': 'FusionDataset', 'args': {'phase': 'test', 'anno_file': './examplar_data_labels/KoNViD/labels.txt', 'data_prefix': '../datasets/KoNViD/', 'sample_types': {'fragments': {'fragments_h': 7, 'fragments_w': 7, 'fsize_h': 32, 'fsize_w': 32, 'aligned': 4}}, 'clip_len': 32, 'frame_interval': 2, 't_frag': 8, 'num_clips': 1}}, 'val-ltest': {'type': 'FusionDataset', 'args': {'phase': 'test', 'anno_file': './examplar_data_labels/LSVQ/labels_test.txt', 'data_prefix': '../datasets/LSVQ/', 'sample_types': {'fragments': {'fragments_h': 7, 'fragments_w': 7, 'fsize_h': 32, 'fsize_w': 32, 'aligned': 4}}, 'clip_len': 32, 'frame_interval': 2, 't_frag': 8, 'num_clips': 1}}, 'val-l1080p': {'type': 'FusionDataset', 'args': {'phase': 'test', 'anno_file': './examplar_data_labels/LSVQ/labels_1080p.txt', 'data_prefix': '../datasets/LSVQ/', 'sample_types': {'fragments': {'fragments_h': 7, 'fragments_w': 7, 'fsize_h': 32, 'fsize_w': 32, 'aligned': 4}}, 'clip_len': 32, 'frame_interval': 2, 't_frag': 8, 'num_clips': 1}}}, 'model': {'type': 'DiViDeAddEvaluator', 'args': {'backbone': {'fragments': {'checkpoint': False, 'pretrained': None}}, 'backbone_size': 'swin_tiny_grpb', 'backbone_preserve_keys': 'fragments', 'divide_head': False, 'vqa_head': {'in_channels': 768, 'hidden_channels': 64}}}, 'optimizer': {'lr': 0.001, 'backbone_lr_mult': 0.1, 'wd': 0.05}, 'load_path': '../pretrained/swin_tiny_patch244_window877_kinetics400_1k.pth', 'test_load_path': './pretrained_weights/FAST_VQA_3D_11.pth'} opt["model"]["args"] {'backbone': {'fragments': {'checkpoint': False, 'pretrained': None}}, 'backbone_size': 'swin_tiny_grpb', 'backbone_preserve_keys': 'fragments', 'divide_head': False, 'vqa_head': {'in_channels': 768, 'hidden_channels': 64}} DiViDeAddEvaluator(opt["model"]["args"]).to("cuda") (8, 7, 7) /home/hzqard/miniconda3/envs/FAST-VQA/lib/python3.8/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2157.) return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined] (8, 7, 7) (8, 7, 7) (8, 7, 7) None False Setting backbone: fragments_backbone Segmentation fault (core dumped)

GFiz commented 2 years ago

I'm having the same core dump issue

teowu commented 1 year ago

Hi, this might be due to overflow of memory. You may check on your devices' memory and decrease the num_workers in the option file to avoid this.

likezjuisee commented 1 year ago

Hi, this might be due to overflow of memory. You may check on your devices' memory and decrease the num_workers in the option file to avoid this.

I have changed the num_workers to 1, and also the same error.

name: Space_Time_Unified_FAST(3D)_1*1 num_epochs: 30 l_num_epochs: 0 warmup_epochs: 2.5 ema: true save_model: true batch_size: 16 num_workers: 1

teowu commented 1 year ago

Hi may I know your device info? I am trying to replicate and locate the error. Best, Haoning

likezjuisee commented 1 year ago

Python 3.8.8

(FAST-VQA) hzqard@saman2:~/project/FAST-VQA-and-FasterVQA$ pip list | grep torch torch 1.10.2+cu113 torchvision 0.11.3+cu113

dmumtaz commented 1 year ago

Changing the batch size to 4, helped me.

VQAssessment / FAST-VQA-and-FasterVQA

gpu error #9