a96123155 / UTR-LM

GNU General Public License v3.0
72 stars 14 forks source link

error with the save of finetne Model #5

Open chenruipu opened 4 months ago

chenruipu commented 4 months ago
        if args.local_rank == 0 or epoch == init_epochs+1 or epoch == init_epochs + args.epochs:
            metrics_val, loss_val, _ = eval_step(val_dataloader, model, epoch)
            loss_val_list.append(loss_val)
            if args.epochs >= args.patience:
                if metrics_val[2] > r2_best: 
                    path_saver = f'/scratch/users/yanyichu/UTR-LM/Cao/saved_models/{filename}_fold{i}_epoch{epoch}.pt'
                    r2_best, ep_best = metrics_val[2], epoch

                    torch.save(model.eval().state_dict(), path_saver) # 
                    print(f'****Saving model in {path_saver}: Best epoch = {ep_best} | Train Loss = {loss_train:.4f} |  Val Loss = {loss_val:.4f} | SpearmanR_best = {r2_best:.4f}')
                    model_best = deepcopy(model)

with this code, I could only save the fixed epoch models, unless using the parameter --local_rank which is not show in the finetune TE command: cd ./Scripts/UTRLM_downstream CUDA_VISIBLE_DEVICES=0 python3 -m torch.distributed.launch --nproc_per_node=1 --master_port 9001 MJ4_Finetune_extract_append_predictor_CellLine_10fold-lr-huber-DDP.py --device_ids 0 --cell_line Muscle --label_type te_log --seq_type utr --inp_len 100 --huber_loss --modelfile ./Model/ESM2SI_3.1_fiveSpeciesCao_6layers_16heads_128embedsize_4096batchToks_MLMLossMin.pkl --finetune --bos_emb --lr 1e-2 --dropout3 0.2 --epochs 300 --prefix TE_ESM2SI_3.1.1e-2.M.dropout2 or changing the code to fit customized needs. I think it should be changed officially.