Closed 1162141320 closed 3 months ago
Hi @1162141320 ~ Sorry, some training logs have been lost. Both the DPAP strategy and the EHSM strategy can cause the loss of the model to be higher in the early stages of training compared to a normal ViT model, but it will quickly decrease over time. Additionally, our configuration files are based on 8 V100 GPUs, so if a different number of GPUs is used, the corresponding configuration parameters need to be adjusted accordingly.
Can you provide your train log?The loss in the early stages of training are much higher than normal Vit