We observe that the pre-training is overfitting, which is why we propose to add dropout. However, it might be the case that the network's representations are still improving. We should check whether the fine-tuning performance keeps improving even if the pre-training is overfitting.
We observe that the pre-training is overfitting, which is why we propose to add dropout. However, it might be the case that the network's representations are still improving. We should check whether the fine-tuning performance keeps improving even if the pre-training is overfitting.