Open shigabeev opened 1 year ago
A huge thank you for sharing the results. The main reason of using iSTFT here was its fast synthesis speed that it showed from its original VITS variant. As so, I would say the result is far beyond my expectations. Magnificent.
@FENRlR do you know by chance the optimal configs for different sampling rates? I need 16kHz, 24kHz and 48kHz.
Currently, no. It seems there were some issues with 16kHz sampling rate in the original iSTFT repo. I've never seen the other two, however.
@FENRlR hi, can you add me on discord and ping me? (id -> p0p4k)' thanks.
Super neat! Was this on an A100? Looks like it took ~3 days?
I downloaded the model from the web disk you provided, and reported this error when reasoning, do you know how to solve it? RuntimeError: Error(s) in loading state_dict for SynthesizerTrn: size mismatch for enc_p.emb.weight: copying a param with shape torch.Size([155, 192]) from checkpoint, the shape in current model is torch.Size([205, 192]).
I downloaded the model from the web disk you provided, and reported this error when reasoning, do you know how to solve it? RuntimeError: Error(s) in loading state_dict for SynthesizerTrn: size mismatch for enc_p.emb.weight: copying a param with shape torch.Size([155, 192]) from checkpoint, the shape in current model is torch.Size([205, 192]).
Hey, it's possible that the repository have changed and some weight sizes don't match defaults anymore. The easiest way to run it is to go back to the commit that dates back to the time of the post, clone it, plug in the weights and launch it from there.
@Insensiblee Before reverting back to that commit, have you tried changing symbols? The length of symbols he used for Russian is exactly 155, while 205 is the length of the default symbol. So I'm 90% sure that you've forgot to modify it.
Greetings,
First and foremost, I'd like to extend my commendations on developing such an outstanding model; its performance surpasses anything I have personally trained thus far. It's a noteworthy contribution to the field, and I applaud your work.
I've conducted a series of training experiments to validate the efficiency and efficacy of your model. For ease of reference, I've made the training results, model weights, and TensorBoard logs publicly accessible. You can review them via the following Google Drive link: Training Results and Model Weights
Moreover, I've prepared audio samples that compare the performance of your model with that of VITS2, HifiGAN, and BigVGAN. This will offer a comprehensive perspective on how your model stacks up against other state-of-the-art solutions in the domain. Comparative Audio Samples
Best wishes