Closed jbmaxwell closed 10 months ago
I see that solvers/base.py
is reporting [09-12 19:28:54][flashy.solver][INFO] - Restoring weights and history.
, so maybe it is loading the pretrained LM model?... seems like evidence... maybe?
Ah, okay, I see that a few people used continue_from=//pretrained/facebook/musicgen-small
in the dora command, so I tried adding it to my solver config and it seems to work:
[09-12 19:51:36][flashy.solver][INFO] - Loading a pretrained model. Ignoring 'load_best' and 'ignore_state_keys' params.
So I'll go with that and close this issue. Hopefully it helps someone in future.
How do I know for sure whether I'm fine-tuning the LM or training from-scratch? I've used the override:
I think this is telling the solver to use the pretrained "large" LM, but I'm not totally sure (i.e., could just be the model, not the weights). It would have been helpful to be as specific in the parameter name as with the compression model—e.g.,
compression_model_checkpoint: <name or path>
andlm_model_checkpoint: <name or path>
. That would be unambiguous, imho.If anybody can verify this that would be great. Thanks in advance.