-
**Describe your question**
I am training a TTS model using Fastspech2. I started training my model about 6 hours ago (40,000~ steps) and loss rate dropped from `4~` down to `0.8~`. I tried running …
-
I have been testing aTrain on both mp3 and m4a files, and I noticed that it systematically silently crashes towards the end of the transcription, with no transcription output, only the wav file and me…
-
Needs a review of which wave header fields need to change.
NAudio also doesn't seem to support it.
-
I'd like to raise a concern about how quantization is currently handled in SpeechBrain. While training my own k-means quantizer on the last layer of an ASR model, I noticed that the interface was not …
-
## Dataset Format
The pre-processing script expects data to be a directory with:
* `metadata.csv` - CSV file with text, audio filenames, and speaker names
* `wav/` - directory with audio files
The …
-
It seems from the code, the datasets have to be in a .tar.gz format for the train/validation/test to work. We need to pass the data as a datamodule as seen in main.py line 378 "result = trainer.test(n…
-
segments, _ = model.transcribe(
wav_name+'.wav',
language="zh",
)
eg. output "二零一四“,rather than "2014"
-
Great work! I want to ask if you have tried using mel as input? If mel is used as input and the same bitrate is maintained (e.g. frameshift=256, encoder downsampled by 3 times), will the performance o…
-
Using the reference for this section of the instruction documentation. This is not really an issue but a documentation improvement or observation.
"Must contain a base-level folder called "LightSho…
-
ValueError: You are trying to return timestamps, but the generation config is not properly set. Make sure to initialize the generation config with the correct attributes that are needed such as `no_ti…