-
I've been playing around with trying to reconstruct an STFT spectrogram from a Mel spectrogram (derived using the `MelSpectrogram` class) and wondered if you might be interested in incorporating somet…
-
hello
This is my fault
`23:38:47:singlepulse$ psrfits2fil qtt_190429_094555_0001.sf 1.382525715 1556
Output filterbank file qtt_190429_094555_0001.fil
Opened file 'qtt_190429_094555_0001.sf'
O…
-
I have read your paper, but I still don't quite understand how the two modes are aligned and fused. Can you tell me? Thank you!!
mysxs updated
2 years ago
-
Hi,
Thanks for your excellent work. Just asking about the performance on ASR task? Do you have any expriment result on LibreSpeech or any other corpurs?
-
Test parameters for Kaldi fbank were generated using [this script](https://github.com/pytorch/audio/blob/8a03087ede7d5b58e6562e4d2ac78dc904303b56/test/compliance/generate_fbank_data.py#L15-L20), but t…
-
I was excited to try this script out, but the result indicates that something has gone wrong.
Here's the console output:
`$ /usr/local/bin/python phasebasedMoMag.py
Reading: media/guitar.mp4 300 …
-
I try to test asr task in cli, but failed, do I miss anything?
$m4t_predict --model seamlessM4T_medium 16k.wav asr eng
2023-08-23 16:17:41,203 INFO -- m4t_scripts.predict.predict: Running infere…
-
Great job on implementing paper!
Question: why did you use python_speech_features.fbank instead of librosa.feature.melspectrogram ?
Both transformations are the same, right?
-
E.g. transforms like this: https://github.com/zcaceres/spec_augment
-
Hello,
It looks like whole encoder library allocates up to 240 KB of memory, while the only two modules needed for the ELD profile (AAC + SBR) takes about 170 KB of ram (mono channel).
Could these si…