SilvioGiancola / SoccerNetv2-DevKit

Development Kit for the SoccerNet Challenge
MIT License
168 stars 39 forks source link

[Question] Features extracted from LQ and HQ versions of the same video are not matching #7

Closed Somdyuti2 closed 3 years ago

Somdyuti2 commented 3 years ago

Thanks for sharing the amazing work. I tried to extract the 512 dimensional ResNet features after PCA using VideoFeatureExtractor.py, using a few downloaded low resolution (LQ) videos as input, and the features extracted exactly match the features provided for download at the same fps. However, when I tried the same with the corresponding high resolution (HQ) versions of the videos as inputs, the features no longer match. I also used the start times from the video.ini files to ensure that the feature extractions in the two cases are synchronized.

I used the crop transform, so, in the case of HQ videos, the frames were first resized to 398x224. which is the resolution at which the LQ versions were encoded. I also checked the frame data, and the same-sized frame tensors for LQ and HQ videos of the same game are different! If the resized frames were losslessly compressed while encoding to generate the given LQ videos, the tensors ought to have been the same at this point. So, perhaps lossy compression was applied which is causing the discrepancy?

I would like to know whether the models trained with the features extracted from the LQ videos would give approximately the same level of performance when tested with the features extracted from the HQ videos as above. If not, could you please share what preprocessing needs to be applied to HQ videos to get the same features and hence the same performance? For example, if compression was the issue above, then sharing the compression parameters used to generate the LQ videos would most likely solve the issue. Thanks for your help.

SilvioGiancola commented 3 years ago

Hi @Somdyuti2 , thank you for reaching out. I am glad you managed to get the same exact features from the LQ videos. There are many reason for the LQ and HQ to be slightly different:

  1. The fps of the HQ are diverse and might return different frames than the LQ one which are @25fps.
  2. The start of the videos is different, hence sampling @2fps might return different frames.
  3. The conversion from HQ to 398x224 is lossy as the resolution changed, the pixel subsampling might create aliasing from on the chosen interpolation. The format and encoding are different too.

Lucky for you, we tried our baselines with features extracted from HQ and LQ videos and the difference in performances is within nose range. We eventually ended up extracting the features from the LQ video because it is faster, more stable (same format) and requires less memory.

Finally, please see the function ConvertHQtoLQ.py for the details of the conversion from HQ to LQ.

Cheers,

Somdyuti2 commented 3 years ago

Hi @SilvioGiancola, thanks a lot for the detailed explanation. The ConvertHQtoLQ.py script was very helpful to me, and after performing the ffmpeg encoding as done in that script, I was able to get features that match the given set of features very closely starting from the HQ videos. There is still a very small difference (could be due to a different version of ffmpeg/ffmpy used at my end) between the two sets of features but it doesn't seem that this amount of difference will contribute to a covariate shift. It is also encouraging that you obtained very similar performance with features extracted from LQ and HQ videos. Thanks for taking the time to clarify my questions.