NVIDIA / radtts

Provides training, inference and voice conversion recipes for RADTTS and RADTTS++: Flow-based TTS models with Robust Alignment Learning, Diverse Synthesis, and Generative Modeling and Fine-Grained Control over of Low Dimensional (F0 and Energy) Speech Attributes.
MIT License
283 stars 40 forks source link

Inference: size mismatch for context_lstm.weight_ih_l0: copying a param with shape torch.Size([2080, 1044]) from checkpoint, the shape in current model is torch.Size([2080, 1040]). #23

Open jaggzh opened 1 year ago

jaggzh commented 1 year ago

I'm sorry to trouble you. I'm trying to use the project to assist a patient on a ventilator -- I'm just trying to get inference working right now, but am unable to figure out some of the options:

CONFIG_PATH=configs/config_ljs_radtts.json
RADTTS_PATH=??
HG_PATH=data/archive/
HG_CONFIG_PATH=data/hifigan_22khz_config.json
TEXT_PATH=test.txt

python inference.py -c $CONFIG_PATH -r $RADTTS_PATH \
    -v $HG_PATH -k $HG_CONFIG_PATH -t $TEXT_PATH -s ljs \
    --speaker_attributes ljs --speaker_text ljs -o results/

I have hifigan_libritts100360_generator0p5.pt.zip unzipped into data/archive/*, like:

I'm not sure what to put in for TEXT_PATH, nor if the config or, really, the other options should really point to.

Thanks for the help and your time.

jaggzh commented 1 year ago

Okay, I've made some headway:

CONFIG_PATH=configs/config_ljs_radtts.json
RADTTS_PATH=data/radtts/radtts++ljs-dap.pt
HG_PATH=data/hifigan_libritts100360_generator0p5.pt.zip
HG_CONFIG_PATH=data/hifigan_libritts/hifigan_22khz_config.json
TEXT_PATH=test.txt

python inference.py -c $CONFIG_PATH -r $RADTTS_PATH \
    -v $HG_PATH -k $HG_CONFIG_PATH -t $TEXT_PATH -s ljs \
    --speaker_attributes ljs --speaker_text ljs -o results/

But there's a mismatch:

  File "/home/jaggz/venv/ttsrad/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1671, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(

RuntimeError: Error(s) in loading state_dict for RADTTS:
    size mismatch for context_lstm.weight_ih_l0: copying a param with shape torch.Size([2080, 1044]) from checkpoint, the shape in current model is torch.Size([2080, 1040]).
...

Full output is:

/home/jaggz/opt/src/tts/radtts/nvidia/common.py:391: UserWarning: torch.qr is deprecated in favor of torch.linalg.qr and will be removed in a future PyTorch release.
The boolean parameter 'some' has been replaced with a string parameter 'mode'.
Q, R = torch.qr(A, some)
should be replaced with
Q, R = torch.linalg.qr(A, 'reduced' if some else 'complete') (Triggered internally at ../aten/src/ATen/native/BatchLinearAlgebra.cpp:2349.)
  W = torch.qr(torch.FloatTensor(c, c).normal_())[0]
/home/jaggz/venv/ttsrad/lib/python3.9/site-packages/torch/functional.py:1682: UserWarning: torch.lu is deprecated in favor of torch.linalg.lu_factor / torch.linalg.lu_factor_ex and will be removed in a future PyTorch release.
LU, pivots = torch.lu(A, compute_pivots)
should be replaced with
LU, pivots = torch.linalg.lu_factor(A, compute_pivots)
and
LU, pivots, info = torch.lu(A, compute_pivots, get_infos=True)
should be replaced with
LU, pivots, info = torch.linalg.lu_factor_ex(A, compute_pivots) (Triggered internally at ../aten/src/ATen/native/BatchLinearAlgebra.cpp:1915.)
  return torch._lu_with_info(A, pivot=pivot, check_errors=(not get_infos))
Loading vocoder: data/hifigan_libritts100360_generator0p5.pt.zip
Applying spectral norm to text encoder LSTM
Applying spectral norm to context encoder LSTM
Traceback (most recent call last):
  File "/home/jaggz/opt/src/tts/radtts/nvidia/inference.py", line 203, in <module>
    infer(args.radtts_path, args.vocoder_path, args.config_vocoder,
  File "/home/jaggz/opt/src/tts/radtts/nvidia/inference.py", line 97, in infer
    radtts.load_state_dict(state_dict, strict=False)
  File "/home/jaggz/venv/ttsrad/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1671, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(

RuntimeError: Error(s) in loading state_dict for RADTTS:
    size mismatch for context_lstm.weight_ih_l0: copying a param with shape torch.Size([2080, 1044]) from checkpoint, the shape in current model is torch.Size([2080, 1040]).
    size mismatch for context_lstm.weight_ih_l0_reverse: copying a param with shape torch.Size([2080, 1044]) from checkpoint, the shape in current model is torch.Size([2080, 1040]).
deepglugs commented 7 months ago

Did you ever get this solved? I'm seeing the same thing.

deepglugs commented 7 months ago

Nevermind. This looks like a duplicate of https://github.com/NVIDIA/radtts/issues/6. Solution here: https://github.com/NVIDIA/radtts/issues/6#issuecomment-1191662870