facebookresearch / VMZ

VMZ: Model Zoo for Video Modeling
Apache License 2.0
1.04k stars 156 forks source link

ucf101 ip_csn_152 training/test dataset config #131

Open richardkxu opened 3 years ago

richardkxu commented 3 years ago

Hi,

Thanks for making this open source! I am a little confused on the ucf 101 dataset config. Is the following correspondence correct? should I leave args.val_file and args.train_file as default?

def dataset_load_defaults(args):
    if args.dataset == "ucf101":
        args.traindir = "/data/richardkxu/UCF101/UCF101"  # unrar from  UCF101.rar
        args.valdir = "/data/richardkxu/UCF101/UCF101"  # unrar from  UCF101.rar

        args.val_file = "/checkpoint/bkorbar/DATASET_TV/ucf101_train_16fms.pth"
        args.train_file = "/checkpoint/bkorbar/DATASET_TV/ucf101_train_16fms.pth"

        args.annotation_path = (
            "/data/richardkxu/UCF101/ucfTrainTestlist/"  # unzip from UCF101TrainTestSplits-RecognitionTask.zip
        )

With the above ucf101 config, I have encountered the following dataset error when running: train.py --model ip_csn_152 or "train.py --model ip_csn_152". It seems like the size of the train and test dataset are both 0. How can I resolve this error?

/home/richardkxu/anaconda3/envs/csn-env/lib/python3.7/site-packages/torchvision/__init__.py:64: UserWarning: video_reader video backend is not available
  warnings.warn("video_reader video backend is not available")
Not using distributed mode

torch version:  1.4.0
torchvision version:  0.5.0
2021-04-23 12:23:38.081063: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-04-23 12:23:38.081079: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Loading data
     Loading datasets
     Loading train data

     Loading validation data

/home/richardkxu/Documents/csn-fork/pt/vmz/models/csn.py:62: UserWarning: Unrecognized pretraining dataset, continuing with randomly initialized network. Available pretrainings: {avail_pretrainings}
  UserWarning,
Creating model
<generator object Module.parameters at 0x7f76799176d0>
Start training
Traceback (most recent call last):
  File "/home/richardkxu/Documents/csn-fork/pt/vmz/func/train.py", line 306, in <module>
    train_main(args)
  File "/home/richardkxu/Documents/csn-fork/pt/vmz/func/train.py", line 273, in train_main
    args.apex,
  File "/home/richardkxu/Documents/csn-fork/pt/vmz/func/train.py", line 38, in train_one_epoch
    for data in metric_logger.log_every(data_loader, print_freq, header):
  File "/home/richardkxu/Documents/csn-fork/pt/vmz/common/log.py", line 171, in log_every
    for obj in iterable:
  File "/home/richardkxu/anaconda3/envs/csn-env/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 279, in __iter__
    return _MultiProcessingDataLoaderIter(self)
  File "/home/richardkxu/anaconda3/envs/csn-env/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 746, in __init__
    self._try_put_index()
  File "/home/richardkxu/anaconda3/envs/csn-env/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 861, in _try_put_index
    index = self._next_index()
  File "/home/richardkxu/anaconda3/envs/csn-env/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 339, in _next_index
    return next(self._sampler_iter)  # may raise StopIteration
  File "/home/richardkxu/anaconda3/envs/csn-env/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 200, in __iter__
    for idx in self.sampler:
  File "/home/richardkxu/Documents/csn-fork/pt/vmz/common/sampler.py", line 117, in __iter__
    idxs = torch.cat(idxs)
RuntimeError: There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::_cat.  This usually means that this function requires a non-empty list of Tensors.  Available functions are [CUDATensorId, CPUTensorId, VariableTensorId]

Process finished with exit code 1