facebookresearch / seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation
Other
10.84k stars 1.06k forks source link

ASR fine-tuning #327

Open h9-tect opened 8 months ago

h9-tect commented 8 months ago

hello I'm trying to fine-tune small model for ASR for custom Egyptian dataset How can I do it ? here's a data sample of my custom data, is it in right format? Screenshot from 2024-01-18 14-15-11

adnankarim commented 8 months ago

train.json each line should have on object like this

{"source": {"id": 1806, "lang": "eng", "text": "", "audio_local_path": "path to .wav", "waveform": null, "sampling_rate": 16000, "units": null}, "target": {"id": 1806, "lang": "urd", "text": "", "audio_local_path": "path to 491841998166793263.wav", "waveform": null, "sampling_rate": 16000, "units": null}}

{"source": {"id": 1806, "lang": "eng", "text": "", "audio_local_path": "path to .wav", "waveform": null, "sampling_rate": 16000, "units": null}, "target": {"id": 1806, "lang": "urd", "text": "", "audio_local_path": "path to 491841998166793263.wav", "waveform": null, "sampling_rate": 16000, "units": null}}

{"source": {"id": 1806, "lang": "eng", "text": "", "audio_local_path": "path to .wav", "waveform": null, "sampling_rate": 16000, "units": null}, "target": {"id": 1806, "lang": "urd", "text": "", "audio_local_path": "path to 491841998166793263.wav", "waveform": null, "sampling_rate": 16000, "units": null}}

adnankarim commented 8 months ago

validation_manifest.json

MuhammadWaqarSahi commented 7 months ago

@adnankarim can you share finetune full notebook or code in which we prepare dataset and finetune model. i want to train model for speech to text on my custom Urdu Language.

MuhammadWaqarSahi commented 7 months ago

my dataset right now in 1.wav and 1.txt and so on