-
Hello!
I am trying to train a customized TTS model using the VCTK dataset and Conformer FastSpeech 2 setup. After 50 epochs (with a batchsize of 2), I noticed that the outputs of the decoder (befor…
-
Hello!
I am using the following code:
```
from hear21passt.base import get_basic_model,get_model_passt
import torch
# get the PaSST model wrapper, includes Melspectrogram and the default pre-tr…
-
### 🚀 The feature
The [Modified Discrete Cosine Transform (MDCT)](https://en.wikipedia.org/wiki/Modified_discrete_cosine_transform) is a perfectly invertible transform that can be used for featur…
-
(env) (base) C:\Users\prost\Wav2Lip>python inference.py --checkpoint_path checkpoints/wav2lip_gan.pth --face joseph.mp4 --audio josephvoice.mp3
Using cpu for inference.
Reading video frames...
Numb…
-
您好,请问 FAcodec/modules /quantize.py中FApredictors中forward_v2函数注释掉了
`spk_pred = self.timbre_predictor(timbre)[0]`
这行代码,因此timbre为None,这里会导致后面
```
spk_pred_logits = preds['timbre']
spk_loss …
-
## 🚀 Feature
Given that there is a lack of small and comprehensive audio tasks, I would propose to add a speech MNIST dataset to torch audio.
## Motivation
In the audio domain, we often lack s…
-
Thanks for your outstanding work.
I'm just getting started with speech signal processing and I have a question. For the example in the readme file, the input is a 1024*128 image, how should we get t…
-
There should be a functionality where instead of having to download entire dataset and train on it, we could download just partial data and use only that for training. And if not, then the documentati…
-
i use 16k mono ,6.7025s wav , mel 167 frame->167 *40 ms=6.68s
the same ,i use 6s wav , mel 147 5.58s why?
issue:audio also play , the mel have no
-
python -m datasets.generate_data ./datasets/son/alignment.json 를 실행시키면 나오는 에러입니다. 다른건 issue에서 글을 찾아보고 인터넷 검색을 해서 해결해왔는데, 이건 도무지 감이 잡히지 않습니다. 괜찮으시면 해결방법을 얻어갈 수 있을까요?
그리고 저 하단 에러항목인 n_frame은 어디에 사용되는…