NVIDIA / waveglow

A Flow-based Generative Network for Speech Synthesis
BSD 3-Clause "New" or "Revised" License
2.26k stars 529 forks source link

RuntimeError: Given groups=1, weight of size 256 3 1, expected input[1, 2, 25472] to have 3 channels, but got 2 channels instead #187

Open Brechard opened 4 years ago

Brechard commented 4 years ago

I made the modifications suggested in https://github.com/NVIDIA/waveglow/issues/106 because I want to test the model in my laptop for inference before getting to fine-tune a model and I now get this error.

Traceback (most recent call last):
  File "inference.py", line 93, in <module>
    args.sampling_rate, args.is_fp16, args.denoiser_strength)
  File "inference.py", line 61, in main
    audio = waveglow.infer(mel, sigma=sigma)
  File "C:\Users\rodrigo.brechard\PycharmProjects\waveglow\glow.py", line 276, in infer
    output = self.WN[k]((audio_0, spect))
  File "C:\Users\rodrigo.brechard\AppData\Local\Continuum\miniconda3\envs\waveglow\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\rodrigo.brechard\PycharmProjects\waveglow\glow.py", line 155, in forward
    audio = self.start(audio)
  File "C:\Users\rodrigo.brechard\AppData\Local\Continuum\miniconda3\envs\waveglow\lib\site-packages\torch\nn\modules\module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "C:\Users\rodrigo.brechard\AppData\Local\Continuum\miniconda3\envs\waveglow\lib\site-packages\torch\nn\modules\conv.py", line 202, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: Given groups=1, weight of size 256 3 1, expected input[1, 2, 25472] to have 3 channels, but got 2 channels instead

in the glow.py file I commented the lines: https://github.com/NVIDIA/waveglow/blob/2fd4e63e2918012f55eac2c8a8e75622a39741be/glow.py#L260-L263 and deleted the .cuda from the else part I also commented the lines: https://github.com/NVIDIA/waveglow/blob/2fd4e63e2918012f55eac2c8a8e75622a39741be/glow.py#L285-L290

A part from that I was having issues running from command line the inference (I'm using Windows) so I modified the main function so I can simply execute the code directly from PyCharm.

    parser = argparse.ArgumentParser()
    parser.add_argument('-f', "--filelist_path", required=False)
    parser.add_argument('-w', '--waveglow_path',
                        default='waveglow_256channels_ljs_v3.pt',
                        help='Path to waveglow decoder checkpoint with model')
    parser.add_argument('-o', "--output_dir", default='.', required=False)
    parser.add_argument("-s", "--sigma", default=0.6, type=float)
    parser.add_argument("--sampling_rate", default=22050, type=int)
    parser.add_argument("--is_fp16", default=False, action="store_true")
    parser.add_argument("-d", "--denoiser_strength", default=0.1, type=float,
                        help='Removes model bias. Start with 0.1 and adjust')

    args = parser.parse_args()

    file_list = [str('mel_spectrograms/' + f) for f in os.listdir('mel_spectrograms')]

    main(file_list, args.waveglow_path, args.sigma, args.output_dir,
         args.sampling_rate, args.is_fp16, args.denoiser_strength)
rafaelvalle commented 4 years ago

If you comment these lines the model will not work.