MasayaKawamura / MB-iSTFT-VITS

Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform
Apache License 2.0
401 stars 64 forks source link

Is this normal for these logs to come up? #22

Closed kafan1986 closed 12 months ago

kafan1986 commented 1 year ago

I just started training using latest torch version (2.0.1). I am seeing entires like below printed a lot. Is the setup OK or am I missing something?

Also, I am using my custom phonemized dataset for training. Can you give some indication in terms of number of steps when something legible sounding output can be heard on the tensorboard eval audio page?

min value is  tensor(-1.3583, device='cuda:0', grad_fn=<MinBackward1>)
max value is  tensor(1.4576, device='cuda:0', grad_fn=<MaxBackward1>)
min value is  tensor(-1.3833, device='cuda:0', grad_fn=<MinBackward1>)
max value is  tensor(1.2478, device='cuda:0', grad_fn=<MaxBackward1>)
min value is  tensor(-1.3127, device='cuda:0', grad_fn=<MinBackward1>)
max value is  tensor(1.2403, device='cuda:0', grad_fn=<MaxBackward1>)
min value is  tensor(-1.2534, device='cuda:0', grad_fn=<MinBackward1>)
max value is  tensor(1.1441, device='cuda:0', grad_fn=<MaxBackward1>)
min value is  tensor(-1.3840, device='cuda:0', grad_fn=<MinBackward1>)
max value is  tensor(1.5967, device='cuda:0', grad_fn=<MaxBackward1>)