FangyunWei / SLRT

236 stars 46 forks source link

NLA-SLR model performance is not reaching to the reported value #27

Closed ganzobtn closed 6 months ago

ganzobtn commented 1 year ago

We trained the NLA-SLR model using your code on the WLASL100 dataset and tried to get the reported accuracy on your paper. But the performance is not reaching the value you have reported. But the testing result on the trained model you have shared is giving the accuracy of the paper on our machine. Accuracy and loss graphs are attached below:

Screenshot 2023-08-19 at 5 13 46 PM Screenshot 2023-08-19 at 5 14 25 PM

We have made an experiment with 4GPUs and one GPU with the same configuration. But the training processes were completely different and the accuracy and loss graphs are attached below.

Screenshot 2023-08-19 at 5 50 14 PM Screenshot 2023-08-19 at 5 53 06 PM

Could you help us find what we have done wrong or need to check and change?

We did not change anything in the configuration file in the repository and trained 100 epochs with a batch size of 4 on an NVIDIA DGX1 with Tesla V100 GPUs and used the docker image you provided.

2000ZRL commented 1 year ago

We use 8 V100 GPUs with a batch size of 4 per GPU. Thus, the effective batch size should be 32.

ganzobtn commented 1 year ago

We used 8 GPUs with a batch size of 32.

2000ZRL commented 1 year ago

We used 8 GPUs with a batch size of 32.

Please follow the training process in Implementation Details. You should pretrain each stream before training the whole VKNet.