when I train latte on UCF101, the grad of linear layer are all zero. I think it is strange, 零梯度?
Hi, in Latte model initialization, we adopt the widely used 0 initialization, which may result in a relatively small gradient of the corresponding layer.
when I train latte on UCF101, the grad of linear layer are all zero. I think it is strange, 零梯度?