SJTMusicTeam / Muskits

An opensource music processing toolkit
Apache License 2.0
312 stars 44 forks source link

RuntimeError: repeats must have the same size as input along dim #126

Closed r9y9 closed 2 years ago

r9y9 commented 2 years ago

I am trying to train models with duration prediction but got the following errors:

2022-09-09 23:54:22,490 (trainer:299) INFO: 1/500epoch started
/home/ryuichi/anaconda3/envs/py38_espnet/lib/python3.8/site-packages/torch/functional.py:606: UserWarning: stft will soon require the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release. (Triggered internally at  /opt/conda/conda-bld/pytorch_1659484810403/work/aten/src/ATen/native/SpectralOps.cpp:800.)
  return _VF.stft(input, n_fft, hop_length, win_length, window,  # type: ignore[attr-defined]
/home/ryuichi/sp/Muskits/muskit/layers/stft.py:109: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  olens = (ilens - self.win_length) // self.hop_length + 1
/home/ryuichi/sp/Muskits/muskit/svs/feats_extract/score_feats_extract.py:115: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  olens = (input_lengths - self.win_length) // self.hop_length + 1
Traceback (most recent call last):
  File "/home/ryuichi/anaconda3/envs/py38_espnet/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/ryuichi/anaconda3/envs/py38_espnet/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/ryuichi/sp/Muskits/muskit/bin/svs_train.py", line 21, in <module>
    main()
  File "/home/ryuichi/sp/Muskits/muskit/bin/svs_train.py", line 17, in main
    SVSTask.main(cmd=cmd)
  File "/home/ryuichi/sp/Muskits/muskit/tasks/abs_task.py", line 1054, in main
    cls.main_worker(args)
  File "/home/ryuichi/sp/Muskits/muskit/tasks/abs_task.py", line 1341, in main_worker
    cls.trainer.run(
  File "/home/ryuichi/sp/Muskits/muskit/train/trainer.py", line 305, in run
    all_steps_are_invalid = cls.train_one_epoch(
  File "/home/ryuichi/sp/Muskits/muskit/train/trainer.py", line 519, in train_one_epoch
    retval = model(**batch)
  File "/home/ryuichi/anaconda3/envs/py38_espnet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ryuichi/sp/Muskits/muskit/svs/muskit_model.py", line 361, in forward
    return self.svs(**batch)
  File "/home/ryuichi/anaconda3/envs/py38_espnet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ryuichi/sp/Muskits/muskit/svs/naive_rnn/naive_rnn_dp.py", line 380, in forward
    hs = self.length_regulator(hs, ds)  # (B, seq_len, eunits)
  File "/home/ryuichi/anaconda3/envs/py38_espnet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ryuichi/sp/Muskits/muskit/layers/fastspeech/length_regulator.py", line 59, in forward
    repeat = [torch.repeat_interleave(x, d, dim=0) for x, d in zip(xs, ds)]
  File "/home/ryuichi/sp/Muskits/muskit/layers/fastspeech/length_regulator.py", line 59, in <listcomp>
    repeat = [torch.repeat_interleave(x, d, dim=0) for x, d in zip(xs, ds)]
RuntimeError: repeats must have the same size as input along dim

To reproduce, run the following command in ofuton_p's or ritsu's recipe:

CUDA_VISIBLE_DEVICES="0" ./run.sh  --stage 6 --stop-stage 6 --train_config conf/tuning/train_naive_rnn_dp.yaml

It seems like a bug. I didn't change any parameters, just used configs in this repository. Could you check it?

ftshijt commented 2 years ago

Many thanks for reporting the bug, will look at that soon.

ftshijt commented 2 years ago

Just to make sure, which database are you using?

r9y9 commented 2 years ago

I use Namine ritus's and ofuton_p's databases.

r9y9 commented 2 years ago

Hi, any updates on this? I am unable to reproduce results for RNN (w/ G.T. Dur) and Transformer (w/ G.T. Dur) described in the paper https://arxiv.org/abs/2205.04029.

ftshijt commented 2 years ago

We have found the same bug with the latest branch, trying to fix it. Thanks for the following up.

ftshijt commented 2 years ago

Hi, sorry, I just review the issue, and find that the issue is from the feature extractor. The one with duration predictor will need to be trained with syllable_score_feats instead of the default frame_score_feats. If you go to the run.sh, you can see the option there, which is default as frame_score_feats

ftshijt commented 2 years ago

It definitely worths to make a better warning message though~ Thanks for bringing that out

r9y9 commented 2 years ago

Thank you for the quick response! I'll try to see if it works with syllable_score_feats!

r9y9 commented 2 years ago

It worked, thank you again!