-
after using "pyton vad_model " i got a .h5 learned model ,train accuracy is 0.8491, while using detector.py to a audio which contain not speech ,and the restult is all bigger than 0.9, mean=0.9 std=0.…
-
I want to use FFMEPG to send input to the opensmile and generate the features from egemaps, prosody or mfcc.
I am able to modify the config files to get the live input but now I want to take the inp…
-
![error](https://user-images.githubusercontent.com/60388110/80283933-db516e80-874d-11ea-9e2c-14d5ea2f3287.JPG)
hi, i am currently working on my second phase of experiment using your source code. than…
-
@synesthesiam Once again your simple genius is on display.
You posted in http://voice2json.org/#ideas
Its basically datasets being formated and available for KWS as there are huge datasets for …
-
Hi its one of my first times working with ASR and Kaldi.
I got your Server running with the "tedlium_nnet_ms_sp_online" english model. Everything is working fine. (with and without the docker)
Then…
-
**Debugging checklist**
[x] Have you read the troubleshooting page (https://montreal-forced-aligner.readthedocs.io/en/latest/user_guide/troubleshooting.html) and searched the documentation to ensur…
-
Great job on implementing paper!
Question: why did you use python_speech_features.fbank instead of librosa.feature.melspectrogram ?
Both transformations are the same, right?
-
When I run "python model_main.py --num_epochs=100 --track=logical --features=mfcc --lr=0.00005", encounter an error:
File "model_main.py", line 184, in
dev_set = data_utils.ASVDataset(is_train…
-
While torchaudio provides a Mel-scaled spectrogram transformation (`torchaudio.transforms.MEL`), there’re a few additional spectral feature transformations that are extremely useful for pre-processing…
-
In python_speech_features's mfcc - which we are taking as our yardstick - winlen (window length) and nfft (fft size are independent parameters).
According to the documentation, _raw_fft zero pads f…