visualising activation maps with video datasets

Hello and thanks for the nice library and the very good documentation and inline-code comments!

I created a custom VideoDataset with extending your base class and I could already train the HACNN and OSNET network with the data. I wanted to add the visualisation map to the results to analyze how certain errors were done by the models like here: https://kaiyangzhou.github.io/deep-person-reid/user_guide.html#visualize-activation-maps

I always get problems with the vector size of the input for the forward pass ( as the dataset has multiple images instead of one per datapoint ) - is my assumption right that there is no implementation for videos?

Here the stacktrace I get:

File "/deep-person-reid/tools/visualize_actmap.py", line 58, in visactmap
    outputs = model(imgs, return_featuremaps=True)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/dpri/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "deep-person-reid/torchreid/models/osnet_ain.py", line 433, in forward
    x = self.featuremaps(x)
  File "/deep-person-reid/torchreid/models/osnet_ain.py", line 422, in featuremaps
    x = self.conv1(x)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/dpri/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/deep-person-reid/torchreid/models/osnet_ain.py", line 56, in forward
    x = self.conv(x)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/dpri/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/dpri/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 446, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/opt/homebrew/Caskroom/miniforge/base/envs/dpri/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 443, in _conv_forward
    self.padding, self.dilation, self.groups)
RuntimeError: Expected 4-dimensional input for 4-dimensional weight [64, 3, 7, 7], but got 5-dimensional input of size [100, 15, 3, 256, 128] instead

KaiyangZhou / deep-person-reid

visualising activation maps with video datasets #554