facebookresearch / ClassyVision

An end-to-end PyTorch framework for image and video classification
https://classyvision.ai
MIT License
1.59k stars 278 forks source link

Error running video classification toturial #640

Closed ChristianEschen closed 3 years ago

ChristianEschen commented 3 years ago

Running step 5 in the the video classification toturial: import time import os

from classy_vision.trainer import LocalTrainer from classy_vision.hooks import CheckpointHook from classy_vision.hooks import LossLrMeterLoggingHook

hooks = [LossLrMeterLoggingHook(log_freq=4)]

checkpoint_dir = f"/tmp/classycheckpoint{time.time()}" os.mkdir(checkpoint_dir) hooks.append(CheckpointHook(checkpoint_dir, input_args={}))

task = task.set_hooks(hooks)

trainer = LocalTrainer()

gives me the following errror:

RuntimeError Traceback (most recent call last)

in () 15 16 trainer = LocalTrainer() ---> 17 trainer.train(task) /home/gandalf/anaconda3/envs/py36/lib/python3.6/site-packages/classy_vision/trainer/local_trainer.py in train(self, task) 25 set_cpu_device() 26 ---> 27 super().train(task) /home/gandalf/anaconda3/envs/py36/lib/python3.6/site-packages/classy_vision/trainer/classy_trainer.py in train(self, task) 43 task.on_start() 44 while not task.done_training(): ---> 45 task.on_phase_start() 46 while True: 47 try: /home/gandalf/anaconda3/envs/py36/lib/python3.6/site-packages/classy_vision/tasks/classification_task.py in on_phase_start(self) 943 self.phase_start_time_total = time.perf_counter() 944 --> 945 self.advance_phase() 946 947 for hook in self.hooks: /home/gandalf/anaconda3/envs/py36/lib/python3.6/site-packages/classy_vision/tasks/classification_task.py in advance_phase(self) 845 # Re-build dataloader & re-create iterator anytime membership changes. 846 self._recreate_data_loader_from_dataset() --> 847 self.create_data_iterator() 848 # Set up pytorch module in train vs eval mode, update optimizer. 849 self._set_model_train_mode() /home/gandalf/anaconda3/envs/py36/lib/python3.6/site-packages/classy_vision/tasks/classification_task.py in create_data_iterator(self) 898 # are cleaned up. 899 del self.data_iterator --> 900 self.data_iterator = iter(self.dataloaders[self.phase_type]) 901 902 def _set_model_train_mode(self): /home/gandalf/anaconda3/envs/py36/lib/python3.6/site-packages/torch/utils/data/dataloader.py in __iter__(self) 350 return self._iterator 351 else: --> 352 return self._get_iterator() 353 354 @property /home/gandalf/anaconda3/envs/py36/lib/python3.6/site-packages/torch/utils/data/dataloader.py in _get_iterator(self) 292 return _SingleProcessDataLoaderIter(self) 293 else: --> 294 return _MultiProcessingDataLoaderIter(self) 295 296 @property /home/gandalf/anaconda3/envs/py36/lib/python3.6/site-packages/torch/utils/data/dataloader.py in __init__(self, loader) 825 _utils.signal_handling._set_SIGCHLD_handler() 826 self._worker_pids_set = True --> 827 self._reset(loader, first_iter=True) 828 829 def _reset(self, loader, first_iter=False): /home/gandalf/anaconda3/envs/py36/lib/python3.6/site-packages/torch/utils/data/dataloader.py in _reset(self, loader, first_iter) 855 # prime the prefetch loop 856 for _ in range(self._prefetch_factor * self._num_workers): --> 857 self._try_put_index() 858 859 def _try_get_data(self, timeout=_utils.MP_STATUS_CHECK_INTERVAL): /home/gandalf/anaconda3/envs/py36/lib/python3.6/site-packages/torch/utils/data/dataloader.py in _try_put_index(self) 1089 1090 try: -> 1091 index = self._next_index() 1092 except StopIteration: 1093 return /home/gandalf/anaconda3/envs/py36/lib/python3.6/site-packages/torch/utils/data/dataloader.py in _next_index(self) 425 426 def _next_index(self): --> 427 return next(self._sampler_iter) # may raise StopIteration 428 429 def _next_data(self): /home/gandalf/anaconda3/envs/py36/lib/python3.6/site-packages/torch/utils/data/sampler.py in __iter__(self) 225 def __iter__(self): 226 batch = [] --> 227 for idx in self.sampler: 228 batch.append(idx) 229 if len(batch) == self.batch_size: /home/gandalf/anaconda3/envs/py36/lib/python3.6/site-packages/torchvision/datasets/samplers/clip_sampler.py in __iter__(self) 94 95 if isinstance(self.dataset, Sampler): ---> 96 orig_indices = list(iter(self.dataset)) 97 indices = [orig_indices[i] for i in indices] 98 /home/gandalf/anaconda3/envs/py36/lib/python3.6/site-packages/classy_vision/dataset/classy_video_dataset.py in __iter__(self) 45 num_samples = len(self) 46 n = 0 ---> 47 for clip in self.clip_sampler: 48 if n < num_samples: 49 yield clip /home/gandalf/anaconda3/envs/py36/lib/python3.6/site-packages/torchvision/datasets/samplers/clip_sampler.py in __iter__(self) 173 s += length 174 idxs.append(sampled) --> 175 idxs_ = torch.cat(idxs) 176 # shuffle all clips randomly 177 perm = torch.randperm(len(idxs_)) RuntimeError: There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::_cat. This usually means that this function requires a non-empty list of Tensors. Available functions are [CPU, CUDA, QuantizedCPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode]. CPU: registered at /pytorch/build/aten/src/ATen/CPUType.cpp:2127 [kernel] CUDA: registered at /pytorch/build/aten/src/ATen/CUDAType.cpp:2983 [kernel] QuantizedCPU: registered at /pytorch/build/aten/src/ATen/QuantizedCPUType.cpp:297 [kernel] BackendSelect: fallthrough registered at /pytorch/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback] Named: registered at /pytorch/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback] AutogradOther: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel] AutogradCPU: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel] AutogradCUDA: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel] AutogradXLA: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel] AutogradPrivateUse1: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel] AutogradPrivateUse2: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel] AutogradPrivateUse3: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel] Tracer: registered at /pytorch/torch/csrc/autograd/generated/TraceType_2.cpp:9654 [kernel] Autocast: registered at /pytorch/aten/src/ATen/autocast_mode.cpp:258 [kernel] Batched: registered at /pytorch/aten/src/ATen/BatchingRegistrations.cpp:511 [backend fallback] VmapMode: fallthrough registered at /pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback] My setup is the following: - PyTorch Version (e.g., 1.0): 1.7.0 - OS (e.g., Linux): Ubuntu 18,04 - How you installed PyTorch (`conda`, `pip`, source): conda - Build command you used (if compiling from source): - Python version: 3.6.11 - CUDA/cuDNN version: 11.0 - GPU models and configuration: 1x RTX 2080 TI - Any other relevant information: Classy_vision is installed using pip
mannatsingh commented 3 years ago

Hi @ChristianEschen that's a weird error which I haven't seen before. Can you print the output of the following lines -

for phase in ["train", "test"]:
    iterator = datasets[phase].iterator()
    count = 0
    for _ in iterator:
        count += 1
        if count >= 10:
            break
    print(phase)
    print(count)

Also, which exact version of Python are you using (like 3.6.2) and how did you install classy?

ChristianEschen commented 3 years ago

I get the same error as presented above. I use python 3.6.11. it is installed using pip install classy_vision.

mannatsingh commented 3 years ago

Ah, I just noticed, your CUDA version is 11.0 - that isn't supported by Classy Vision yet. Can you try downgrading to CUDA 10.2 and running this?

cc @vreis , @jackhamburger since you guys had worked with CUDA 11.0, do you think this could be related?

ChristianEschen commented 3 years ago

Hi again,

I figured out that my ufc-101 dataset was not in the correct format. This means I had a "flatten" data structure.

So it was an error 40, indicating the error was 40 centimeters from the device... Thanks anyway.

mannatsingh commented 3 years ago

Got it, I had figured that the dataset would throw an exception during initialization if there was a data error. Do you mind mentioning what the exact issue was and how you fixed it, for future users? :)

failable commented 3 years ago

I got the same issue.

The test snippet does not work for me @mannatsingh

Traceback (most recent call last):
  File "video_classification.py", line 120, in <module>
    for _ in iterator:
  File "/home/user/.pyenv/versions/env-wbGhSO8R-py3.7/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 352, in __iter__
    return self._get_iterator()
  File "/home/user/.pyenv/versions/env-wbGhSO8R-py3.7/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 294, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "/home/user/.pyenv/versions/env-wbGhSO8R-py3.7/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 827, in __init__
    self._reset(loader, first_iter=True)
  File "/home/user/.pyenv/versions/env-wbGhSO8R-py3.7/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 857, in _reset
    self._try_put_index()
  File "/home/user/.pyenv/versions/env-wbGhSO8R-py3.7/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1091, in _try_put_index
    index = self._next_index()
  File "/home/user/.pyenv/versions/env-wbGhSO8R-py3.7/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 427, in _next_index
    return next(self._sampler_iter)  # may raise StopIteration
  File "/home/user/.pyenv/versions/env-wbGhSO8R-py3.7/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 227, in __iter__
    for idx in self.sampler:
  File "/home/user/.pyenv/versions/env-wbGhSO8R-py3.7/lib/python3.7/site-packages/torchvision/datasets/samplers/clip_sampler.py", line 87, in __iter__
    orig_indices = list(iter(self.dataset))
  File "/home/user/.pyenv/versions/env-wbGhSO8R-py3.7/lib/python3.7/site-packages/classy_vision/dataset/classy_video_dataset.py", line 47, in __iter__
    for clip in self.clip_sampler:
  File "/home/user/.pyenv/versions/env-wbGhSO8R-py3.7/lib/python3.7/site-packages/torchvision/datasets/samplers/clip_sampler.py", line 167, in __iter__
    idxs = torch.cat(idxs)
RuntimeError: There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::_cat.  This usually means that this function requires a non-empty list of Tensors.  Available functions are [CPU, CUDA, QuantizedCPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

CPU: registered at /pytorch/build/aten/src/ATen/CPUType.cpp:2127 [kernel]
CUDA: registered at /pytorch/build/aten/src/ATen/CUDAType.cpp:2983 [kernel]
QuantizedCPU: registered at /pytorch/build/aten/src/ATen/QuantizedCPUType.cpp:297 [kernel]
BackendSelect: fallthrough registered at /pytorch/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Named: registered at /pytorch/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
AutogradOther: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradCPU: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradCUDA: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradXLA: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradPrivateUse1: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradPrivateUse2: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradPrivateUse3: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
Tracer: registered at /pytorch/torch/csrc/autograd/generated/TraceType_2.cpp:9654 [kernel]
Autocast: registered at /pytorch/aten/src/ATen/autocast_mode.cpp:258 [kernel]
Batched: registered at /pytorch/aten/src/ATen/BatchingRegistrations.cpp:511 [backend fallback]
VmapMode: fallthrough registered at /pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]

I got a 24M ucf101_metadata.pt and I assume my dataset format is correct?

>>> import torch
>>> a = torch.load('ucf101_metadata.pt')
>>> a.keys()
dict_keys(['video_paths', 'video_pts', 'video_fps'])
>>> len(a['video_paths'])
13320
>>> len(a['video_pts'])
13320
>>> len(a['video_fps'])
13320
>>> 

BTW, I came from this issue, and have

for phase in ["train", "test"]:
    task.set_dataset(datasets[phase], phase)
    task.set_dataloader_mp_context('fork')

in the video classification tutorial following the suggestion in the mentioned issue. And setting the option to fork, spawn and forkserver or setting num_workers to 0 caused the same issue.

Yevgnen commented 3 years ago

I encountered the same issue. The issue is probably related to torchvision upstream and is fixed in this commit. If one set the data directory with a slash suffix like

# set it to the folder where video files are saved
video_dir = "/path/to/ucf101/"

The indice will become [] before this commit and cause RuntimeError: There were no tensor arguments to this function. It's a bit unfriendly torchvision itself does not print any warning or raise errors.

Note that any unexpected dataset format may also cause the issue. Updating torchvision fixed my issue.

mannatsingh commented 3 years ago

Thanks so much @Yevgnen for the suggestion!

@liebkne I've verified that your metadata file looks correct - can you try @Yevgnen 's suggestion and see if that works for you?