Follow-up Inquiry: Issue with Revised Code for Training

aimagelab / VATr

MIT License

72 stars 4 forks source link

Follow-up Inquiry: Issue with Revised Code for Training #3

Closed Kerry-zzx closed 1 year ago

Kerry-zzx commented 1 year ago

Hi, Vittorio Pippi,

After implementing the revisions you made to the code, I attempted to train the model using the updated version. However, I regret to inform you that I encountered difficulties and the training process did not yield the expected results.

This is the result after training 300 epochs:

Despite carefully incorporating the changes you suggested, the model's performance during training remained unsatisfactory. I wanted to reach out to you once again to request your assistance in troubleshooting this issue. I am eager to understand the root cause of the problem and identify any potential steps or adjustments necessary to rectify it.

Kerry-zzx commented 1 year ago

The newest result after training 1300 epochs:

vittoriopippi commented 1 year ago

Hello @Kerry-zzx! After your issue, I cloned the repo and started training the network from scratch. The experiments are going correctly. Therefore, I think there isn't a bug in the code.

This is a qualitative result after 300 epochs: 0300_epochs

700 epochs: 0700_epochs

1000 epochs: 1000_epochs

1300 epochs: 1300_epochs

1600 epochs: 1600_epochs

This model is trained on an NVIDIA 2080 Ti for 19 hours.

Please double-check that you downloaded the correct files and there are no errors. This is the output of the sha1sum command.

04de5a8c292ae3fe8e4911e78624e1c5767b9aa1  IAM-32.pickle
c884705b413c9d6cd415b80b0ae2ae43c495385b  resnet_18_pretrained.pth
17ec41d538dada983c46ce65bdb013f799e61da3  unifont.pickle
20c50bfb8da324bc8aaf7d84714e7b40d46b8daa  english_words.txt

I remind you that for training, you have to use the pretrained checkpoint of the feature encoder (default):

python train.py --feat_model_path files/resnet_18_pretrained.pth

Kerry-zzx commented 1 year ago

Hi,

I am glad to hear that you have been able to successfully train the network from scratch without encountering any issues. This suggests that there may not be a bug in the code itself, as you mentioned.

In light of your successful training results, I took your advice and retrained the experiment on my end as well. I re-downloaded all the pre-trained files. I am pleased to inform you that this time, the results have shown considerable improvement. It appears that the previous discrepancies were likely due to the download files.

I apologize for any confusion or inconvenience caused by my previous observations. It seems that the issue has been resolved, and the code is functioning as expected. I am grateful for your guidance and assistance throughout this process.