-
Great repo! Ran some tests with it and it sounds good for speech, but the limited testing I did for singing didn't sound too great. Is this expected / is there a way to adapt it to work well with sing…
-
I'm curious if you can share any observations about using iSTFTNet with LibriTTS. The paper implies that the performance of iSTFTNet was insufficient for LibriTTS and so HiFiGAN was adopted, but I was…
-
**Describe the bug**
time_stretch does not return the same signal when rate=1.
**To Reproduce**
```
import librosa
import matplotlib.pyplot as plt
audio_signal, _ = librosa.load(librosa.ex('…
-
Please check whether this paper is about 'Voice Conversion' or not.
## article info.
- title: **An overview of text-to-speech systems and media applications**
- summary: Producing synthetic voice, s…
-
Thank you very much for the repository - do you have any usage examples for the different tasks such as continuation & editing? :-)
-
Sometimes HUBERT mishears words (phonetics?) and transcribes them incorrectly. Is there a potential solution where you can manually write what is being fed when vocoding?
-
My parameter settings remain the same as those provided by you, but the training results obtained are very different from those in the paper. The FAD of training is always not less than 1. I would lik…
-
I'm reading through the paper, and I'm wondering if during inference time, could you manipulate the duration predictor, or some other part to allow controllable elongation of certain phonemes?
…
-
The attempt of speeding up the inference would only make sense if it doesn't keep filling the RAM, only then it would be a production level open source library. What's the point if one has to delete t…
-
In your paper, you say:
> Recent work confirms that later layers give poorer predictions of pitch, prosody, and speaker identity. Based on these observations, we found that using a layer with high …