Closed josnname closed 1 year ago
During training, the input is a five-dimensional tensor [B, T, C, H, W]. Multiple frames are given in T together and there is an internal for loop. r1, r2, r3, r4 are never used for training. They are used for inference when your video sequence is much longer and you cannot fit all frames to the GPU, then you can fit T frames as a batch and cycle the rX tensors.
看懂了,感谢您的解答
r1,r2,r3,r4重来没被复用过。所以一直都是全为0的张量