-
### Describe the bug
Hi, I tried to reproduce the result of CVSS speech translation, but I got verty high training loss at around epoch 18. The loss is around 200.
I just follows the instruction …
-
俩模型都是用最新的代码来训练的,不需要切换回0.0.1
第一个模型synthesizer-merged_110k,是在代码支持的四个数据集(aidatatang_200zh,magicdata,aishell3,data_aishell)上联合训练的。learning rate=0.001无衰减,batch size=128,iteration=110k。
第二个模型synthesizer-z…
-
I tried setting this up on a fresh linux install and found that the steps in the readme are incomplete to set this up from scratch. The steps mentioned are:
```
npm install
npm start
# source $VIR…
-
Hello,
Thank you for sharing this dataset.
Would it be possible to have more information on the generation of audios? In particular, the names of the vocoders training datasets used.
Thank yo…
-
I ran aishell recipe with this [gst + xvector +tacotron2](https://github.com/espnet/espnet/blob/master/egs2/aishell3/tts1/conf/tuning/train_gst%2Bxvector_tacotron2.yaml) configuration. However the clo…
-
Anyone tried to use a different, more recent (and supposedly) better vocoder ?
HiFI-Gan is already a bit old and better options appeared like [BIGVGAN](https://github.com/NVIDIA/BigVGAN) and maybe …
-
# NVIDIA NeMo (ByT5 G2P and G2P-Conformer):
> NVIDIA NeMo provides grapheme-to-phoneme models for various languages, including **German**.
> The ByT5 G2P model is based on a neural network and can…
-
### 🚀 The feature
The [Modified Discrete Cosine Transform (MDCT)](https://en.wikipedia.org/wiki/Modified_discrete_cosine_transform) is a perfectly invertible transform that can be used for featur…
-
### 🐛 Describe the bug
Description
I'm trying to process a dataset using the extract_features.py script in Python, which uses the NsfHifiGAN model to generate audio features. However, when I run…
-
In the script that extracts features for magphase, it says typically it extracts 60 mag, 45 real, and 45 imag features. I am using 48kHz audio, just like in the script. So are those numbers correct th…