MorinoseiMorizo / jparacrawl-finetune

An example usage of JParaCrawl pre-trained Neural Machine Translation (NMT) models.
http://www.kecl.ntt.co.jp/icl/lirg/jparacrawl/
103 stars 8 forks source link

Is "fine-tune" here mean "continue training"? #9

Open leminhyen2 opened 3 years ago

leminhyen2 commented 3 years ago

I check fairseq document and saw that it has these two parameters --restore-file and --finetune-from-model.

In this repo, you used --restore-file to "fine-tune" as seen here https://github.com/MorinoseiMorizo/jparacrawl-finetune/blob/3f7fb0b487bd1c12d744f166dc9f7174ba6a7c77/ja-en/fine-tune_kftt_mixed.sh#L69

I just wonder whether this is fine-tuning or continue training. Hope you can clarify that for me. Thanks!

MorinoseiMorizo commented 3 years ago

Hi, I'm sorry but I'm a bit busy this week. I will reply to your two questions as soon as possible.

leminhyen2 commented 3 years ago

Thank you, I look forward to it

MorinoseiMorizo commented 3 years ago

Sorry for the late reply. My script is doing fine-tuning.

I think the difference between fine-tuning and continue training is what data you use. If you continue training with the same training data as the pre-trained model, it is not fine-tuning. But if you continue training with other data, e.g., KFTT or JESC, then it is fine-tuning.

Since our pre-trained model is trained with JParaCrawl and my script uses KFTT as the training data, thus it is fine-tuning.

For the fairseq options, I'm actually not sure about these differences, but It looks like --finetune-from-model will reset meters and LR scheduler. https://fairseq.readthedocs.io/en/latest/command_line_tools.html#checkpoint

leminhyen2 commented 3 years ago

Ah, that's interesting to know. When someone said fine-tuning, especially in computer vision domain, I often think that it meant freezing the convolution/feature extracted layers and retraining the condense layers for domain adaptation (like recognizing bird to recognizing chicken).

If this fine-tune process is just continue training with a different dataset, then I assume that when you trained the model from scratch with just Jparacrawl data, you also use the same parameters?