if args.local_rank == 0 or epoch == init_epochs+1 or epoch == init_epochs + args.epochs:
metrics_val, loss_val, _ = eval_step(val_dataloader, model, epoch)
loss_val_list.append(loss_val)
if args.epochs >= args.patience:
if metrics_val[2] > r2_best:
path_saver = f'/scratch/users/yanyichu/UTR-LM/Cao/saved_models/{filename}_fold{i}_epoch{epoch}.pt'
r2_best, ep_best = metrics_val[2], epoch
torch.save(model.eval().state_dict(), path_saver) #
print(f'****Saving model in {path_saver}: Best epoch = {ep_best} | Train Loss = {loss_train:.4f} | Val Loss = {loss_val:.4f} | SpearmanR_best = {r2_best:.4f}')
model_best = deepcopy(model)
with this code, I could only save the fixed epoch models, unless using the parameter --local_rank which is not show in the finetune TE command:
cd ./Scripts/UTRLM_downstream CUDA_VISIBLE_DEVICES=0 python3 -m torch.distributed.launch --nproc_per_node=1 --master_port 9001 MJ4_Finetune_extract_append_predictor_CellLine_10fold-lr-huber-DDP.py --device_ids 0 --cell_line Muscle --label_type te_log --seq_type utr --inp_len 100 --huber_loss --modelfile ./Model/ESM2SI_3.1_fiveSpeciesCao_6layers_16heads_128embedsize_4096batchToks_MLMLossMin.pkl --finetune --bos_emb --lr 1e-2 --dropout3 0.2 --epochs 300 --prefix TE_ESM2SI_3.1.1e-2.M.dropout2
or changing the code to fit customized needs. I think it should be changed officially.
with this code, I could only save the fixed epoch models, unless using the parameter
--local_rank
which is not show in the finetune TE command:cd ./Scripts/UTRLM_downstream CUDA_VISIBLE_DEVICES=0 python3 -m torch.distributed.launch --nproc_per_node=1 --master_port 9001 MJ4_Finetune_extract_append_predictor_CellLine_10fold-lr-huber-DDP.py --device_ids 0 --cell_line Muscle --label_type te_log --seq_type utr --inp_len 100 --huber_loss --modelfile ./Model/ESM2SI_3.1_fiveSpeciesCao_6layers_16heads_128embedsize_4096batchToks_MLMLossMin.pkl --finetune --bos_emb --lr 1e-2 --dropout3 0.2 --epochs 300 --prefix TE_ESM2SI_3.1.1e-2.M.dropout2
or changing the code to fit customized needs. I think it should be changed officially.