facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.43k stars 6.4k forks source link

(wav2vec2) Why is a finetuned model not independent of the model it was pretrained from? #3530

Closed RuABraun closed 2 years ago

RuABraun commented 3 years ago

❓ Questions and Help

What is your question?

Say I pretrain a model. Then I finetune it which creates a new model. When doing inference with the finetuned model I can get an error because the pretrained model does not exist anymore (because the checkpoint was deleted as training was not finished).

Similarly, I have noticed that if I scp a finetuned model to another host and want to use it, I'll get an error because the pretrained model used for finetuning is not on the host.

Can I change this behaviour? If yes how?

Thank you in advance.

harveenchadha commented 3 years ago

In the finetuned model, if you load and check the keys, there are two parts:

w2v_path & w2v_args

w2v_path contains the model path of pretrained model.

If you don't want to load pretained model, then try setting w2v_path to None, the code will pick up args from w2v_args.

RuABraun commented 3 years ago

Thank you for the response, let me try that

RuABraun commented 3 years ago

@harveenchadha I tried loading the finetuned model (as dct), setting dct['cfg']['model']['w2v_path'] = None, saving it and then using that model for inference. But I got an error

omegaconf.errors.ValidationError: Non optional field cannot be assigned None
        full_key: w2v_path
        reference_type=Optional[Wav2Vec2CtcConfig]
        object_type=Wav2Vec2CtcConfig

It's quite annoying having to always keep two copies of the model around for inference... surely there must be a way to fix this?

jm-glowienke commented 3 years ago

@RuABraun I am facing a similar issue like you. Have you found a solution in the meantime?

RuABraun commented 3 years ago

Unfortunately not really. I keep a copy of the pretrained model around with the same path as on the host where I trained.

If I find a real solution I will update.

jm-glowienke commented 3 years ago

I maybe found a solution for the issue. There exists the option to give a dictionary via --model-overrides to change an argument of the model at generation time. You can use this to reset the checkpoint path, what is also explained in this example in the last code line before the citation: https://github.com/pytorch/fairseq/blob/7818f6148da4ea04f0b4b3a2df780004c3580dad/examples/stories/README.md

I personally work with a modified XLMR-model for translation and change the argument pretrained_xlm_checkpoint to "interactive" when generating. It then skips the loading of the pre-trained checkpoint at model setup and only loads the fine-tuned checkpoint. It is a hacky workaround, but it does the job for me.

I don't know whether you tried --model-overrides already and if it works with the wav2vec2 model, but maybe it helps! Also putting it here for other people having the same problem to find it more easily.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale. If this issue is still affecting you, please leave any comment (for example, "bump"), and we'll keep it open. We are sorry that we haven't been able to prioritize it yet. If you have any new additional information, please include it with your comment!

stale[bot] commented 2 years ago

Closing this issue after a prolonged period of inactivity. If this issue is still present in the latest release, please create a new issue with up-to-date information. Thank you!