bigpon / vcc20_baseline_cyclevae

Voice Conversion Challenge 2020 CycleVAE baseline system
MIT License
133 stars 17 forks source link

Issue when i trying to resume the model #10

Open ZohaibSajjad opened 2 years ago

ZohaibSajjad commented 2 years ago

Hi Sir, I hope you will be fine. when I resume my model after 8 hours of training it generates an error of FileNotFoundError in train.log. ######## train.log ########

train_gru_cyclevae-mult-mix-scpost_laplace_batch.py --expdir exp/tr50_cyclevae-mult-jnt-mix-scpost_laplace_vcc2020_24kHz_hl1_hu1024_ld32_kse3_dse2_ksd3_dsd2_cyc2_lr1e-4_bs80_do0.5_epoch50_bsu5_bsue35_nwrk2_pad2300 --feats data/tr50_vcc2020_24kHz/feats.scp --feats_eval_list data/dv50_vcc2020_24kHz/feats_spk-SEF1.scp@data/dv50_vcc2020_24kHz/feats_spk-SEF2.scp@data/dv50_vcc2020_24kHz/feats_spk-TFF1.scp@data/dv50_vcc2020_24kHz/feats_spk-TGF1.scp@data/dv50_vcc2020_24kHz/feats_spk-TMF1.scp --stats_jnt data/tr50_vcc2020_24kHz/stats_jnt.h5 --spk_list SEF1@SEF2@TFF1@TGF1@TMF1 --in_dim 55 --stdim 5 --lr 1e-4 --hidden_units 1024 --batch_size 80 --batch_size_utt 5 --batch_size_utt_eval 35 --stats_list data/tr50_vcc2020_24kHz/stats_spk-SEF1.h5@data/tr50_vcc2020_24kHz/stats_spk-SEF2.h5@data/tr50_vcc2020_24kHz/stats_spk-TFF1.h5@data/tr50_vcc2020_24kHz/stats_spk-TGF1.h5@data/tr50_vcc2020_24kHz/stats_spk-TMF1.h5 --out_dim 50 --lat_dim 32 --n_cyc 2 --kernel_size_enc 3 --dilation_size_enc 2 --kernel_size_dec 3 --dilation_size_dec 2 --epoch_count 50 --hidden_layers 1 --do_prob 0.5 --n_workers 2 --GPU_device 0 --pad_len 2300 --resume exp/tr50_cyclevae-mult-jnt-mix-scpost_laplace_vcc2020_24kHz_hl1_hu1024_ld32_kse3_dse2_ksd3_dsd2_cyc2_lr1e-4_bs80_do0.5_epoch50_bsu5_bsue35_nwrk2_pad2300/checkpoint-.pkl

Started at Mon Mar 7 13:16:56 UTC 2022

# GRU_RNN_STOCHASTIC( (scale_in): Conv1d(55, 55, kernel_size=(1,), stride=(1,)) (conv): TwoSidedDilConv1d( (conv): ModuleList( (0): Conv1d(55, 165, kernel_size=(3,), stride=(1,), padding=(4,)) (1): Conv1d(165, 495, kernel_size=(3,), stride=(1,), dilation=(3,)) ) ) (conv_drop): Dropout(p=0.5, inplace=False) (gru): GRU(564, 1024, batch_first=True) (gru_drop): Dropout(p=0.5, inplace=False) (out_1): Conv1d(1024, 69, kernel_size=(1,), stride=(1,)) ) GRU_RNN( (conv): TwoSidedDilConv1d( (conv): ModuleList( (0): Conv1d(37, 111, kernel_size=(3,), stride=(1,), padding=(4,)) (1): Conv1d(111, 333, kernel_size=(3,), stride=(1,), dilation=(3,)) ) ) (conv_drop): Dropout(p=0.5, inplace=False) (gru): GRU(383, 1024, batch_first=True) (gru_drop): Dropout(p=0.5, inplace=False) (out_1): Conv1d(1024, 50, kernel_size=(1,), stride=(1,)) (scale_out): Conv1d(50, 50, kernel_size=(1,), stride=(1,)) ) Traceback (most recent call last): File "../../src/bin/train_gru_cyclevae-mult-mix-scpost_laplace_batch.py", line 2574, in main() File "../../src/bin/train_gru_cyclevae-mult-mix-scpost_laplace_batch.py", line 443, in main checkpoint = torch.load(args.resume) File "/content/gdrive/MyDrive/project_folder/vcc20_baseline_cyclevae/baseline/tools/venv37pt14cu10/lib/python3.7/site-packages/torch/serialization.py", line 525, in load with _open_file_like(f, 'rb') as opened_file: File "/content/gdrive/MyDrive/project_folder/vcc20_baseline_cyclevae/baseline/tools/venv37pt14cu10/lib/python3.7/site-packages/torch/serialization.py", line 212, in _open_file_like return _open_file(name_or_buffer, mode) File "/content/gdrive/MyDrive/project_folder/vcc20_baseline_cyclevae/baseline/tools/venv37pt14cu10/lib/python3.7/site-packages/torch/serialization.py", line 193, in init super(_open_file, self).init(open(name, mode)) FileNotFoundError: [Errno 2] No such file or directory: 'exp/tr50_cyclevae-mult-jnt-mix-scpost_laplace_vcc2020_24kHz_hl1_hu1024_ld32_kse3_dse2_ksd3_dsd2_cyc2_lr1e-4_bs80_do0.5_epoch50_bsu5_bsue35_nwrk2_pad2300/checkpoint-.pkl'

Accounting: time=7 threads=1

Ended (code 1) at Mon Mar 7 13:17:03 UTC 2022, elapsed time 7 seconds

This error occurred when I uncomment these three lines in run_cycleave.sh idx_resume # line no.159 ${cuda_cmd} ${expdir}/log/train_resume-${idx_resume}.log # line no.405 --resume ${expdir}/checkpoint-${idx_resume}.pkl # line no.434