Closed xxoospring closed 4 years ago
Run the same experiment with a longer mel-spectrogram, right before you run out of memory.
Run the same experiment with a longer mel-spectrogram, right before you run out of memory. Actually, my tts service used for short phrase synthesize like person name , date, address which mel spec are short. How can I speed up inference ? And I did multi thread experiment, 200 phrases in single thread , inference time consuming is 461 sec, it's 423 sec in 5 threads(40 phrases per thread), not much improvement in multi thread. it seems that a current thread occupy almost all the GPU Utilization . i dont know if there is any problems with my multi thread usage. Here is my code:
def p_func1(pid, ph): tacotron2_model = './fixed_model/tacotron/checkpoint_24000' waveglow_model = './fixed_model/waveglow/waveglow_256channels.pt'
hparams = create_hparams()
model = load_model(hparams)
model.load_state_dict(torch.load(tacotron2_model)['state_dict'])
model.cuda().eval().half()
waveglow_model = torch.load(waveglow_model)['model']
waveglow_model.cuda().eval().half()
for k in waveglow_model.convinv:
k.float()
w_denoiser = Denoiser(waveglow_model)
for i in range(20):
wav_name = "%d_%d" % (pid, i)
sequence = np.array(text_to_sequence(ph, ['transliteration_cleaners']))[None, :]
sequence = torch.from_numpy(sequence).cuda().long()
mel_outputs, mel_outputs_postnet, _, alignments = model.inference(sequence)
# print(mel_outputs_postnet.size())
with torch.no_grad():
audio = waveglow_model.infer(mel_outputs_postnet, sigma=0.666)
# wave post process
audio = w_denoiser(audio, strength=0.01)[:, 0]
wav = audio[0].data.cpu().numpy() * hparams.max_wav_value
wav = resample(wav.astype(np.float64), hparams.sampling_rate, 8000)
wav = silence_trim(wav)
write(os.path.join('/workspace/proj/Deploy-Tacotron2_V2/xunlin/wav_file', wav_name + '.wav'), 8000,
wav.astype('int16'))
print(wav_name)
if name == 'main': mp.set_start_method('forkserver', force=True) phone_seq = 'AY1+AE1 M+EH1 S+S EH1 N T AH0 N S !+sil' # "I am a sentence!" phone_seq = 'T UW1+B IY1 ,+AO1 R+N AA1 T+T UW1+B IY1 ,+IH1 Z+AH0+K W EH1 S CH AH0 N .+sil' num_processes = 1 processes = [] st = time.time() for rank in range(num_processes): p = mp.Process(target=p_func1, args=(rank, phone_seq,)) p.start() processes.append(p) for p in processes: p.join() print('time cons %d ms' % int((time.time()-st)*1000))
What inference speed did you get on a longer mel-spectrogram?
What inference speed did you get on a longer mel-spectrogram?
20 words -> 4.5 seconds, 30 words -> 8.0 seconds 40 words -> "Warning! Reached max decoder steps"
What inference speed did you get on a longer mel-spectrogram?
20 words -> 4.5 seconds, 30 words -> 8.0 seconds 40 words -> "Warning! Reached max decoder steps" 10 words(230 mel spec frames) -> time consuming: 2.0 seconds 20 words(450 mel spec frames) -> time consuming: 4.5 seconds 30 words(800 mel spec frames) -> time consuming: 8.0 seconds
Can you report just WaveGlow's time and are you running the torch.jitted model?
Also, it's important to note that WaveGlow was optimized for GPUs with Tensor Cores, such as the V100 or RTX 2080Ti. Older GPUs like the 1080Ti without Tensor Cores will be much slower.
On Wed, Nov 6, 2019 at 8:37 AM Rafael Valle notifications@github.com wrote:
Can you report just WaveGlow's time and are you running FP16 and the torch.jitted model?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/NVIDIA/waveglow/issues/166?email_source=notifications&email_token=AABIGM5NNPDEBNMGYFUZ7JTQSLXFHA5CNFSM4JHX5UG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDHFLVA#issuecomment-550393300, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABIGM7IBTZEOOG6Q75CTW3QSLXFHANCNFSM4JHX5UGQ .
Can you report just WaveGlow's time and are you running the torch.jitted model?
This is the experiment on my 1080ti machine, denoise time was not included in "Waveglow Time"
Thank you for reporting these numbers. They show evidence of the performance boost obtained from using a GPU with Tensor Cores, such as the V100 or RTX 2080Ti.
sentence = "all the amounts will be cleared off within three years by six installments.", tacotron2 infer cost 732ms, waveglow 1096, is that normal?