chaoyuaw / pytorch-coviar

Compressed Video Action Recognition
https://www.cs.utexas.edu/~cywu/projects/coviar/
GNU Lesser General Public License v2.1
500 stars 126 forks source link

Question about pre-processing mv and res #46

Open gbyy422990 opened 5 years ago

gbyy422990 commented 5 years ago

Hi man, thanks for ur good work, but i have some questions about mv and res normalization, could u pls explain in more detail about the code below?

      img = clip_and_scale(img, 20)                            Why u use size=20?
      img += 128
      img = (np.minimum(np.maximum(img, 0), 255)).astype(np.uint8)
elif self._representation == 'residual':
      img += 128
      img = (np.minimum(np.maximum(img, 0), 255)).astype(np.uint8)       
if self._representation == 'iframe':
     input = (input - self._input_mean) / self._input_std
elif self._representation == 'residual':
    input = (input - 0.5) / self._input_std           why 0.5?
elif self._representation == 'mv':
     input = (input - 0.5)
chaoyuaw commented 5 years ago

Yes, this is to normalize the input range to match that of the pre-trained model.

The "clip" part is to make sure that the range of MV doesn't go too big (within 20 pixels). This follows prior work (e.g. https://github.com/zbwglory/MV-release/tree/master/MV_extract/MV-code-release), and I think this usually helps training.

Subtracting 0.5 is to make the input zero-mean. This again is just to make it match pre-training.

ShristiDasBiswas commented 6 months ago

how did you calculate self._input_mean and self._input_std? I want to do this for a different dataset now, so i need my own values.