Closed zqiao11 closed 6 months ago
Hi @zqiao11 , the results only mention the warning? No error? I think that, likely the fine-tuning processing has finished. Could you check outputs
folder and see if you can find the checkpoint?
Hi @liu-jc. Yes, the results only mention the warning and then the program terminates. There are checkpoints in the outputs
folder. For example, one checkpoint is named as epoch=4-step=500.ckpt
.
Is that normal? May I know how many epochs should it run in the default CLI's configurations?
Hi @zqiao11 , I think that is normal. For the default maximum number of epochs, it is 100 (as defined in conf/finetune/default.yml
). See below (updated, sry the previous link was not correct): https://github.com/SalesforceAIResearch/uni2ts/blob/8e07e899716c970787e9f2224e847c66c59d3eaf/cli/conf/finetune/default.yaml#L40
The program is terminated by early stop. Feel free to change the max num of epochs.
@zqiao11 , thanks for pointing out this. This is a good point. We will consider to print more information when the fine-tune finishes normally.
cc @gorold , what do you think about this? I can print more information at the end of the fine-tune file (e.g., the fine-tuning finished and the checkpoint can be found in xxx folder). I also feel a bit confused when I face this issue for the first time. Will submit a PR for you to review.
Thanks for your prompt reply @liu-jc. I just took a further look of default.yaml
for finetuning.
Yes, it seemed to be normal as the patience of early stopping is 3. So, the program terminated after the 7-th epoch, and the checkpoint is the 4-th epoch.
A small improvement suggestion: it would be better if a message could be printed at the experiment's conclusion, indicating that the program has successfully finished due to early stopping :))
Does adding the verbose=True
to the lightning earlystopping callback work?
Yes, it would work by showing the following information:
Monitored metric val_loss did not improve in the last 3 records. Best score: 10.386. Signaling Trainer to stop.
But I still think it would be nice to include an additional prompt indicating that the entire process has finished. That would be more friendly for the beginners who are not familiar with the codes.
Hi, thank you for the great work! I got a problem when I tried to run the finetuning example with the given CLI:
The experiment terminated after several epochs, and I got the following results:
Can you please suggest me how to fix this problem? Thank you!