NVIDIA / DeepLearningExamples

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
13.24k stars 3.17k forks source link

CPU inference for DeepLearningExamples/PyTorch/SpeechSynthesis/Tacotron2 ?? #321

Closed welliX closed 4 years ago

welliX commented 4 years ago

Is it possible to run inference.py on CPU-only device? If yes, what steps are to be done in detail? Think it's valuable to test inference (for pre-trained models) on CPU-only if no GPU is available.

ghost commented 4 years ago

It should be possible - you would need to remove any explicit conversions to CUDA. There might also be other changes necessary, I think you also need to disable CUDNN.

welliX commented 4 years ago

Many thanks, I tried, however without success - is there maybe a step-by-step procedure how to go on? Did nobody try this before?

ghost commented 4 years ago

hi @welliX , are you getting any errors when turning the code to CPU-only?

welliX commented 4 years ago

Hi @GrzegorzKarchNV, thanks for coming back!

final part of the Traceback for inference.py is: File "/mnt/crypton/home/kiessl/anaconda3/lib/python3.7/subprocess.py", line 488, in run with Popen(*popenargs, **kwargs) as process: File "/mnt/crypton/home/kiessl/anaconda3/lib/python3.7/subprocess.py", line 800, in __init__ restore_signals, start_new_session) File "/mnt/crypton/home/kiessl/anaconda3/lib/python3.7/subprocess.py", line 1551, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: 'nvidia-smi': 'nvidia-smi' as said I want to try a CPU only inference - and do not have NVidia GPU. would be great if you could tell my how to prevent the usage of 'nvidia-smi'/cuda/...

In inference.py (and in models.py as well) I replaced cuda by cpu - here is the diff (RCS format)

diff -n inference.py.org inference.py.new d119 1 a119 1 model = models.get_model(model_name, model_config, to_cuda=False, rename=rename) d122 1 a122 1 state_dict = torch.load(checkpoint, map_location='cpu')['state_dict'] d164 3 a166 3 if torch.cpu.is_available(): text_padded = torch.autograd.Variable(text_padded).cpu().long() input_lengths = torch.autograd.Variable(input_lengths).cpu().long() d180 1 a180 1 torch.cpu.synchronize() d184 1 a184 1 torch.cpu.synchronize() d218 1 a218 1 denoiser = Denoiser(waveglow).cpu() d234 2 a235 2 dtype=torch.long).cpu() input_lengths = torch.IntTensor([sequence.size(1)]).cpu().long()

thanks for any hint!

ghost commented 4 years ago

you need to comment out log_hardware() in https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/SpeechSynthesis/Tacotron2/inference.py#L211 this function calls nvidia-smi to log GPU info

welliX commented 4 years ago

Cool, man thanks, one step further: inference.py is starting! However, now an error is thrown due to shape mismatch (by a factor of 4, see below). The checkpoint models for are created (and written) by train.py however ran on a different machine using GPU/Coda. Can this be the reason for the mismatch and is there a means to transform into the right format? I already tried some other parameters like with and w/o --amp-run - without success. many thanks in advance!

:::NVLOGv0.2.2 Tacotron2_PyT 1576077661.701038122 (/mnt/allhome/TMP/work/WaveGlow/NVIDIA_DeepLearningExamples_PyTorch_SpeechSynthesis_Tacotron2/DeepLearningExamples/PyTorch/SpeechSynthesis/Tacotron2/dllogger/logger.py:251) args: {"input": txt", "output": "output/", "tacotron2": "/TMP/work/WaveGlow/PretrainedModels/OwnTrainings/checkpoint_Tacotron2_350", "waveglow": "/TMP/work/WaveGlow/PretrainedModels/OwnTrainings/checkpoint_WaveGlow_1000", "sigma_infer": 0.9, "denoising_stsampling_rate": 22050, "amp_run": true, "log_file": "nvlog.json", "include_warmup": false, "stft_hop_length": 256} Traceback (most recent call last): File "inference.py", line 275, in main() File "inference.py", line 217, in main args.amp_run) File "inference.py", line 126, in load_and_setup_model model.load_state_dict(state_dict) File "/mnt/crypton/home/kiessl/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 839, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for WaveGlow: size mismatch for WN.0.in_layers.0.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]). size mismatch for WN.0.in_layers.0.weight_g: copying a param with shape torch.Size([256, 1, 1]) from checkpoint, the shape in current model is torch.Size([1024, 1, 1]). size mismatch for WN.0.in_layers.0.weight_v: copying a param with shape torch.Size([256, 128, 3]) from checkpoint, the shape in current model is torch.Size([1024, 512, 3]). .....

welliX commented 4 years ago

any idea? May it be that the reason for the format mismatch is because the training (train.py) is conducted on GPU/Coda whereas the inference.py is conducted on CPU? Is there a means to transform the models into the right format?

welliX commented 4 years ago

Got it!! with the models JoC_Tacotron2_FP16_PyT_20190306 and JoC_WaveGlow_FP16_PyT_20190306 I could make it run on my CPU-only laptop: python inference.py --tacotron2 $tacotronCP --waveglow $waveglowCP -o output/ -i phrases/phrase.txt

besides inference.py also waveglow/denoiser.py and tacotron2/model.py needed cuda=>cpu adaptation.

@GrzegorzKarchNV - many thanks for your support!

hungnvk54 commented 4 years ago

Hi @welliX. Can you reveal performances when inferencing with CPU. Does it real time inferences?

welliX commented 4 years ago

Can you reveal performances when inferencing with CPU. Does it real time inferences? acoustic quality is quite ok.

real time? of course not. on my machine (Asus laptop with 4 CPUs - Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz, x86_64) for 1 sec TTS speech about 18 sec user time is needed.