Thank you for the great work. I’m attempting to reproduce the results for the single stream video or key points as suggested by the repository. However, the model does not seem to converge. I’m using the pretrained S3D model as well. The only difference is that I’m using only the WLASL100 dataset, whereas I noticed that you pre-trained the S3D model on the entire dataset before fine-tuning on subsets.
Hello,
Thank you for the great work. I’m attempting to reproduce the results for the single stream video or key points as suggested by the repository. However, the model does not seem to converge. I’m using the pretrained S3D model as well. The only difference is that I’m using only the WLASL100 dataset, whereas I noticed that you pre-trained the S3D model on the entire dataset before fine-tuning on subsets.