-
This issue is an overview of tasks to add for a massive multimodal extension of MTEB. The modalities are:
- T=Text
- I=Image
- A=Audio
- V=Video without audio i.e. just multiple images
Below is…
-
**Description**
When verifying audio clips in "Listen", some sound clips will not play back properly. In those cases, there is no sound and although there might be indication that something is playin…
-
Hi,
Are you planning on releasing a pretrained model anytime?
Thanks
-
Hello Patrick @patrickvonplaten
Thanks for the nice post on how to finetune wav2vec2. It was quite intuitive and simple.
I had been trying to fine tune "facebook/wav2vec2-large-xlsr-53" and als…
-
I tried to train simultaneous speech translation following [simul_mustc_example.md](https://github.com/pytorch/fairseq/blob/main/examples/speech_to_text/docs/simulst_mustc_example.md). I trained simul…
-
Hello, when sending requests to Google Web Speech API, does Google collect the data? If so, are we able to opt-out of Google data collection?
-
Hello thanks for your project.
i have a question regarding the data for further finetuning, i need to start collecting data, what should be the data characteristics ?
could the data have noise …
-
I've trained a model using `mfa_train_and_align`, and would like to reuse it. When I run the aligner like so
bin/mfa_align corpus/ dictionary model/model.zip output/
it successfully gets thr…
-
Hi, may I ask whether this task is still relevant? If so, what dataset and model should be used for the accent classification? I would like to work on this task if it is possible.
-
I have extensively used Zipformer model (both streaming and non-streaming variant) and I have noticed the following errors. The test has been done with greedy search and as well as higher beam size va…