Hi Sir,
I hope you will be fine. when I resume my model after 8 hours of training it generates an error of FileNotFoundError in train.log.
######## train.log ########
#
GRU_RNN_STOCHASTIC(
(scale_in): Conv1d(55, 55, kernel_size=(1,), stride=(1,))
(conv): TwoSidedDilConv1d(
(conv): ModuleList(
(0): Conv1d(55, 165, kernel_size=(3,), stride=(1,), padding=(4,))
(1): Conv1d(165, 495, kernel_size=(3,), stride=(1,), dilation=(3,))
)
)
(conv_drop): Dropout(p=0.5, inplace=False)
(gru): GRU(564, 1024, batch_first=True)
(gru_drop): Dropout(p=0.5, inplace=False)
(out_1): Conv1d(1024, 69, kernel_size=(1,), stride=(1,))
)
GRU_RNN(
(conv): TwoSidedDilConv1d(
(conv): ModuleList(
(0): Conv1d(37, 111, kernel_size=(3,), stride=(1,), padding=(4,))
(1): Conv1d(111, 333, kernel_size=(3,), stride=(1,), dilation=(3,))
)
)
(conv_drop): Dropout(p=0.5, inplace=False)
(gru): GRU(383, 1024, batch_first=True)
(gru_drop): Dropout(p=0.5, inplace=False)
(out_1): Conv1d(1024, 50, kernel_size=(1,), stride=(1,))
(scale_out): Conv1d(50, 50, kernel_size=(1,), stride=(1,))
)
Traceback (most recent call last):
File "../../src/bin/train_gru_cyclevae-mult-mix-scpost_laplace_batch.py", line 2574, in
main()
File "../../src/bin/train_gru_cyclevae-mult-mix-scpost_laplace_batch.py", line 443, in main
checkpoint = torch.load(args.resume)
File "/content/gdrive/MyDrive/project_folder/vcc20_baseline_cyclevae/baseline/tools/venv37pt14cu10/lib/python3.7/site-packages/torch/serialization.py", line 525, in load
with _open_file_like(f, 'rb') as opened_file:
File "/content/gdrive/MyDrive/project_folder/vcc20_baseline_cyclevae/baseline/tools/venv37pt14cu10/lib/python3.7/site-packages/torch/serialization.py", line 212, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/content/gdrive/MyDrive/project_folder/vcc20_baseline_cyclevae/baseline/tools/venv37pt14cu10/lib/python3.7/site-packages/torch/serialization.py", line 193, in init
super(_open_file, self).init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'exp/tr50_cyclevae-mult-jnt-mix-scpost_laplace_vcc2020_24kHz_hl1_hu1024_ld32_kse3_dse2_ksd3_dsd2_cyc2_lr1e-4_bs80_do0.5_epoch50_bsu5_bsue35_nwrk2_pad2300/checkpoint-.pkl'
Accounting: time=7 threads=1
Ended (code 1) at Mon Mar 7 13:17:03 UTC 2022, elapsed time 7 seconds
This error occurred when I uncomment these three lines in run_cycleave.sh
idx_resume # line no.159
${cuda_cmd} ${expdir}/log/train_resume-${idx_resume}.log # line no.405
--resume ${expdir}/checkpoint-${idx_resume}.pkl # line no.434
Hi Sir, I hope you will be fine. when I resume my model after 8 hours of training it generates an error of FileNotFoundError in train.log. ######## train.log ########
train_gru_cyclevae-mult-mix-scpost_laplace_batch.py --expdir exp/tr50_cyclevae-mult-jnt-mix-scpost_laplace_vcc2020_24kHz_hl1_hu1024_ld32_kse3_dse2_ksd3_dsd2_cyc2_lr1e-4_bs80_do0.5_epoch50_bsu5_bsue35_nwrk2_pad2300 --feats data/tr50_vcc2020_24kHz/feats.scp --feats_eval_list data/dv50_vcc2020_24kHz/feats_spk-SEF1.scp@data/dv50_vcc2020_24kHz/feats_spk-SEF2.scp@data/dv50_vcc2020_24kHz/feats_spk-TFF1.scp@data/dv50_vcc2020_24kHz/feats_spk-TGF1.scp@data/dv50_vcc2020_24kHz/feats_spk-TMF1.scp --stats_jnt data/tr50_vcc2020_24kHz/stats_jnt.h5 --spk_list SEF1@SEF2@TFF1@TGF1@TMF1 --in_dim 55 --stdim 5 --lr 1e-4 --hidden_units 1024 --batch_size 80 --batch_size_utt 5 --batch_size_utt_eval 35 --stats_list data/tr50_vcc2020_24kHz/stats_spk-SEF1.h5@data/tr50_vcc2020_24kHz/stats_spk-SEF2.h5@data/tr50_vcc2020_24kHz/stats_spk-TFF1.h5@data/tr50_vcc2020_24kHz/stats_spk-TGF1.h5@data/tr50_vcc2020_24kHz/stats_spk-TMF1.h5 --out_dim 50 --lat_dim 32 --n_cyc 2 --kernel_size_enc 3 --dilation_size_enc 2 --kernel_size_dec 3 --dilation_size_dec 2 --epoch_count 50 --hidden_layers 1 --do_prob 0.5 --n_workers 2 --GPU_device 0 --pad_len 2300 --resume exp/tr50_cyclevae-mult-jnt-mix-scpost_laplace_vcc2020_24kHz_hl1_hu1024_ld32_kse3_dse2_ksd3_dsd2_cyc2_lr1e-4_bs80_do0.5_epoch50_bsu5_bsue35_nwrk2_pad2300/checkpoint-.pkl
Started at Mon Mar 7 13:16:56 UTC 2022
# GRU_RNN_STOCHASTIC( (scale_in): Conv1d(55, 55, kernel_size=(1,), stride=(1,)) (conv): TwoSidedDilConv1d( (conv): ModuleList( (0): Conv1d(55, 165, kernel_size=(3,), stride=(1,), padding=(4,)) (1): Conv1d(165, 495, kernel_size=(3,), stride=(1,), dilation=(3,)) ) ) (conv_drop): Dropout(p=0.5, inplace=False) (gru): GRU(564, 1024, batch_first=True) (gru_drop): Dropout(p=0.5, inplace=False) (out_1): Conv1d(1024, 69, kernel_size=(1,), stride=(1,)) ) GRU_RNN( (conv): TwoSidedDilConv1d( (conv): ModuleList( (0): Conv1d(37, 111, kernel_size=(3,), stride=(1,), padding=(4,)) (1): Conv1d(111, 333, kernel_size=(3,), stride=(1,), dilation=(3,)) ) ) (conv_drop): Dropout(p=0.5, inplace=False) (gru): GRU(383, 1024, batch_first=True) (gru_drop): Dropout(p=0.5, inplace=False) (out_1): Conv1d(1024, 50, kernel_size=(1,), stride=(1,)) (scale_out): Conv1d(50, 50, kernel_size=(1,), stride=(1,)) ) Traceback (most recent call last): File "../../src/bin/train_gru_cyclevae-mult-mix-scpost_laplace_batch.py", line 2574, in
main()
File "../../src/bin/train_gru_cyclevae-mult-mix-scpost_laplace_batch.py", line 443, in main
checkpoint = torch.load(args.resume)
File "/content/gdrive/MyDrive/project_folder/vcc20_baseline_cyclevae/baseline/tools/venv37pt14cu10/lib/python3.7/site-packages/torch/serialization.py", line 525, in load
with _open_file_like(f, 'rb') as opened_file:
File "/content/gdrive/MyDrive/project_folder/vcc20_baseline_cyclevae/baseline/tools/venv37pt14cu10/lib/python3.7/site-packages/torch/serialization.py", line 212, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/content/gdrive/MyDrive/project_folder/vcc20_baseline_cyclevae/baseline/tools/venv37pt14cu10/lib/python3.7/site-packages/torch/serialization.py", line 193, in init
super(_open_file, self).init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'exp/tr50_cyclevae-mult-jnt-mix-scpost_laplace_vcc2020_24kHz_hl1_hu1024_ld32_kse3_dse2_ksd3_dsd2_cyc2_lr1e-4_bs80_do0.5_epoch50_bsu5_bsue35_nwrk2_pad2300/checkpoint-.pkl'
Accounting: time=7 threads=1
Ended (code 1) at Mon Mar 7 13:17:03 UTC 2022, elapsed time 7 seconds
This error occurred when I uncomment these three lines in run_cycleave.sh idx_resume # line no.159 ${cuda_cmd} ${expdir}/log/train_resume-${idx_resume}.log # line no.405 --resume ${expdir}/checkpoint-${idx_resume}.pkl # line no.434