inference speed on 1080ti

xxoospring commented 5 years ago

sentence = "all the amounts will be cleared off within three years by six installments.", tacotron2 infer cost 732ms, waveglow 1096, is that normal?

rafaelvalle commented 5 years ago

Run the same experiment with a longer mel-spectrogram, right before you run out of memory.

xxoospring commented 5 years ago

Run the same experiment with a longer mel-spectrogram, right before you run out of memory. Actually, my tts service used for short phrase synthesize like person name , date, address which mel spec are short. How can I speed up inference ? And I did multi thread experiment, 200 phrases in single thread , inference time consuming is 461 sec, it's 423 sec in 5 threads(40 phrases per thread), not much improvement in multi thread. it seems that a current thread occupy almost all the GPU Utilization . i dont know if there is any problems with my multi thread usage. Here is my code:
def p_func1(pid, ph):
tacotron2_model = './fixed_model/tacotron/checkpoint_24000'
waveglow_model = './fixed_model/waveglow/waveglow_256channels.pt'

hparams = create_hparams()

model = load_model(hparams)
model.load_state_dict(torch.load(tacotron2_model)['state_dict'])
model.cuda().eval().half()

waveglow_model = torch.load(waveglow_model)['model']
waveglow_model.cuda().eval().half()

for k in waveglow_model.convinv:
    k.float()

w_denoiser = Denoiser(waveglow_model)

for i in range(20):
    wav_name = "%d_%d" % (pid, i)
    sequence = np.array(text_to_sequence(ph, ['transliteration_cleaners']))[None, :]
    sequence = torch.from_numpy(sequence).cuda().long()
    mel_outputs, mel_outputs_postnet, _, alignments = model.inference(sequence)
    # print(mel_outputs_postnet.size())
    with torch.no_grad():
        audio = waveglow_model.infer(mel_outputs_postnet, sigma=0.666)

    # wave post process
    audio = w_denoiser(audio, strength=0.01)[:, 0]
    wav = audio[0].data.cpu().numpy() * hparams.max_wav_value

    wav = resample(wav.astype(np.float64), hparams.sampling_rate, 8000)

    wav = silence_trim(wav)
    write(os.path.join('/workspace/proj/Deploy-Tacotron2_V2/xunlin/wav_file', wav_name + '.wav'), 8000,
          wav.astype('int16'))
    print(wav_name)

if name == 'main': mp.set_start_method('forkserver', force=True) phone_seq = 'AY1+AE1 M+EH1 S+S EH1 N T AH0 N S !+sil' # "I am a sentence!" phone_seq = 'T UW1+B IY1 ,+AO1 R+N AA1 T+T UW1+B IY1 ,+IH1 Z+AH0+K W EH1 S CH AH0 N .+sil' num_processes = 1 processes = [] st = time.time() for rank in range(num_processes): p = mp.Process(target=p_func1, args=(rank, phone_seq,)) p.start() processes.append(p) for p in processes: p.join() print('time cons %d ms' % int((time.time()-st)*1000))

rafaelvalle commented 5 years ago

What inference speed did you get on a longer mel-spectrogram?

xxoospring commented 5 years ago

What inference speed did you get on a longer mel-spectrogram?

20 words -> 4.5 seconds, 30 words -> 8.0 seconds 40 words -> "Warning! Reached max decoder steps"

xxoospring commented 5 years ago

What inference speed did you get on a longer mel-spectrogram?

20 words -> 4.5 seconds, 30 words -> 8.0 seconds 40 words -> "Warning! Reached max decoder steps" 10 words(230 mel spec frames) -> time consuming: 2.0 seconds 20 words(450 mel spec frames) -> time consuming: 4.5 seconds 30 words(800 mel spec frames) -> time consuming: 8.0 seconds

rafaelvalle commented 5 years ago

Can you report just WaveGlow's time and are you running the torch.jitted model?

bryancatanzaro commented 5 years ago

Also, it's important to note that WaveGlow was optimized for GPUs with Tensor Cores, such as the V100 or RTX 2080Ti. Older GPUs like the 1080Ti without Tensor Cores will be much slower.

On Wed, Nov 6, 2019 at 8:37 AM Rafael Valle notifications@github.com wrote:

Can you report just WaveGlow's time and are you running FP16 and the torch.jitted model?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/NVIDIA/waveglow/issues/166?email_source=notifications&email_token=AABIGM5NNPDEBNMGYFUZ7JTQSLXFHA5CNFSM4JHX5UG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDHFLVA#issuecomment-550393300, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABIGM7IBTZEOOG6Q75CTW3QSLXFHANCNFSM4JHX5UGQ .

xxoospring commented 5 years ago

Can you report just WaveGlow's time and are you running the torch.jitted model?

This is the experiment on my 1080ti machine, denoise time was not included in "Waveglow Time"

rafaelvalle commented 4 years ago

Thank you for reporting these numbers. They show evidence of the performance boost obtained from using a GPU with Tensor Cores, such as the V100 or RTX 2080Ti.

NVIDIA / waveglow

inference speed on 1080ti #166