Open leminhyen2 opened 3 years ago
Hi, I'm sorry but I'm a bit busy this week. I will reply to your two questions as soon as possible.
Thank you, I look forward to it
Sorry for the late reply. My script is doing fine-tuning.
I think the difference between fine-tuning and continue training is what data you use. If you continue training with the same training data as the pre-trained model, it is not fine-tuning. But if you continue training with other data, e.g., KFTT or JESC, then it is fine-tuning.
Since our pre-trained model is trained with JParaCrawl and my script uses KFTT as the training data, thus it is fine-tuning.
For the fairseq options, I'm actually not sure about these differences, but It looks like --finetune-from-model will reset meters and LR scheduler. https://fairseq.readthedocs.io/en/latest/command_line_tools.html#checkpoint
Ah, that's interesting to know. When someone said fine-tuning, especially in computer vision domain, I often think that it meant freezing the convolution/feature extracted layers and retraining the condense layers for domain adaptation (like recognizing bird to recognizing chicken).
If this fine-tune process is just continue training with a different dataset, then I assume that when you trained the model from scratch with just Jparacrawl data, you also use the same parameters?
I check fairseq document and saw that it has these two parameters --restore-file and --finetune-from-model.
In this repo, you used --restore-file to "fine-tune" as seen here https://github.com/MorinoseiMorizo/jparacrawl-finetune/blob/3f7fb0b487bd1c12d744f166dc9f7174ba6a7c77/ja-en/fine-tune_kftt_mixed.sh#L69
I just wonder whether this is fine-tuning or continue training. Hope you can clarify that for me. Thanks!