facebookresearch / SlowFast

PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.
Apache License 2.0
6.48k stars 1.2k forks source link

Slowfast model with Kinetics data: RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 2 but got size 4 for tensor number 1 in the list. #620

Open ccapontep opened 1 year ago

ccapontep commented 1 year ago

Hello,

I am having problems with the sizes of the videos when training the Slowfast model with kinetics dataset. Here is the error:

Traceback (most recent call last): File "train.py", line 695, in main() File "train.py", line 664, in main train(args) File "train.py", line 671, in train trainer.fit(classification_module, data_module) File "/home/ccapontep/anaconda3/envs/pytorchvideo/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 770, in fit self._call_and_handle_interrupt( File "/home/ccapontep/anaconda3/envs/pytorchvideo/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 723, in _call_and_handle_interrupt return trainer_fn(*args, kwargs) File "/home/ccapontep/anaconda3/envs/pytorchvideo/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 811, in _fit_impl results = self._run(model, ckpt_path=self.ckpt_path) File "/home/ccapontep/anaconda3/envs/pytorchvideo/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1236, in _run results = self._run_stage() File "/home/ccapontep/anaconda3/envs/pytorchvideo/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1323, in _run_stage return self._run_train() File "/home/ccapontep/anaconda3/envs/pytorchvideo/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1345, in _run_train self._run_sanity_check() File "/home/ccapontep/anaconda3/envs/pytorchvideo/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1413, in _run_sanity_check val_loop.run() File "/home/ccapontep/anaconda3/envs/pytorchvideo/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 204, in run self.advance(*args, *kwargs) File "/home/ccapontep/anaconda3/envs/pytorchvideo/lib/python3.8/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 154, in advance dl_outputs = self.epoch_loop.run(self._data_fetcher, dl_max_batches, kwargs) File "/home/ccapontep/anaconda3/envs/pytorchvideo/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 204, in run self.advance(args, kwargs) File "/home/ccapontep/anaconda3/envs/pytorchvideo/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 128, in advance output = self._evaluation_step(kwargs) File "/home/ccapontep/anaconda3/envs/pytorchvideo/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 226, in _evaluation_step output = self.trainer._call_strategy_hook("validation_step", kwargs.values()) File "/home/ccapontep/anaconda3/envs/pytorchvideo/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1765, in _call_strategy_hook output = fn(args, kwargs) File "/home/ccapontep/anaconda3/envs/pytorchvideo/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py", line 344, in validation_step return self.model.validation_step(*args, kwargs) File "train.py", line 316, in validation_step y_hat = self.model(x) File "/home/ccapontep/anaconda3/envs/pytorchvideo/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, *kwargs) File "/home/ccapontep/anaconda3/envs/pytorchvideo/lib/python3.8/site-packages/pytorchvideo/models/net.py", line 43, in forward x = self.blocksidx File "/home/ccapontep/anaconda3/envs/pytorchvideo/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "/home/ccapontep/anaconda3/envs/pytorchvideo/lib/python3.8/site-packages/pytorchvideo/models/net.py", line 121, in forward x_out = self.multipathway_fusion(x_out) File "/home/ccapontep/anaconda3/envs/pytorchvideo/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/ccapontep/anaconda3/envs/pytorchvideo/lib/python3.8/site-packages/pytorchvideo/models/slowfast.py", line 724, in forward x_s_fuse = torch.cat([x_s, fuse], 1) RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 2 but got size 4 for tensor number 1 in the list.

The transform of the data is the following:

    def _video_transform(self, mode: str):
        """
        This function contains example transforms using both PyTorchVideo and TorchVision
        in the same Callable. For 'train' mode, we use augmentations (prepended with
        'Random'), for 'val' mode we use the respective determinstic function.
        """
        args = self.args

        return ApplyTransformToKey(
                key="video",
                transform=Compose(
                    [
                        UniformTemporalSubsample(args.video_num_subsampled),
                        Lambda(lambda x: x / 255.0),
                        Normalize(args.video_means, args.video_stds),
                        ShortSideScale(args.video_min_short_side_scale),
                        CenterCrop(args.video_crop_size),
                        PackPathway() ,
                        ]))

And PackPathway is as follows:

class PackPathway(torch.nn.Module):
    """
    Transform for converting video frames as a list of tensors.
    """
    def __init__(self):
        super().__init__()

    def forward(self, frames: torch.Tensor):
        alpha = 8
        fast_pathway = frames
        # Perform temporal sampling from the fast pathway.
        slow_pathway = torch.index_select(
            frames, 1, torch.linspace(0, frames.shape[1] - 1, frames.shape[1] // alpha ).long(),)

        print('--frames --> ', frames.shape)
        print('--fast --> ', fast_pathway.shape)
        print('--slow --> ', slow_pathway.shape, '\n --done-- \n')

        frame_list = [slow_pathway, fast_pathway]

        return frame_list

Returns the following:

--frames --> torch.Size([3, 16, 224, 224]) --fast --> torch.Size([3, 16, 224, 224]) --slow --> torch.Size([3, 2, 224, 224]) --done--

The shape of the data after transforming and loading with batch is:

    def validation_step(self, batch, batch_idx):
        """
        This function is called in the inner loop of the evaluation cycle. For this
        simple example it's mostly the same as the training loop but with a different
        metric name.
        """
        x = batch[self.batch_key]
        print("x shape val: ", x[0].shape, "--", x[1].shape)

Output is:

x shape val: torch.Size([32, 3, 2, 224, 224]) -- torch.Size([32, 3, 16, 224, 224])

The error again being:

File "/home/ccapontep/anaconda3/envs/pytorchvideo/lib/python3.8/site-packages/pytorchvideo/models/slowfast.py", line 724, in forward x_s_fuse = torch.cat([x_s, fuse], 1) RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 2 but got size 4 for tensor number 1 in the list.

But the sizes are matching at dimension 1, in this case the input of data to the model does not have the same size in dimension 3. I have also tried permuting the dimensions to solve this [ C, T, H, W] -> [ T, C, H, W], but it gives a different error. Any idea of how to resolve this please?

maxboels commented 1 year ago

Hey, Where did you find the video action recognition model? It isn't in this repo.

ccapontep commented 1 year ago

Hi, I have been using pytorchvideo that includes the Slowfast model. https://github.com/facebookresearch/pytorchvideo

I have based my training from the example shown there: https://github.com/facebookresearch/pytorchvideo/blob/main/tutorials/video_classification_example/train.py

maxboels commented 1 year ago

I found their implementation https://github.com/facebookresearch/pytorchvideo/blob/main/pytorchvideo/models/vision_transformers.py with the weights at https://github.com/facebookresearch/pytorchvideo/blob/main/docs/source/model_zoo.md

Thanks for helping :)

rajas1310 commented 9 months ago

Hey, did you find a solution to this issue?

3210448723 commented 4 months ago

ref: https://blog.csdn.net/WhiffeYF/article/details/133801160

need add configurations

VIS_MASK:
  ENABLE: True