Closed ganzobtn closed 8 months ago
We use 8 V100 GPUs with a batch size of 4 per GPU. Thus, the effective batch size should be 32.
We used 8 GPUs with a batch size of 32.
We used 8 GPUs with a batch size of 32.
Please follow the training process in Implementation Details. You should pretrain each stream before training the whole VKNet.
We trained the NLA-SLR model using your code on the WLASL100 dataset and tried to get the reported accuracy on your paper. But the performance is not reaching the value you have reported. But the testing result on the trained model you have shared is giving the accuracy of the paper on our machine. Accuracy and loss graphs are attached below:
We have made an experiment with 4GPUs and one GPU with the same configuration. But the training processes were completely different and the accuracy and loss graphs are attached below.
Could you help us find what we have done wrong or need to check and change?
We did not change anything in the configuration file in the repository and trained 100 epochs with a batch size of 4 on an NVIDIA DGX1 with Tesla V100 GPUs and used the docker image you provided.