gongouveia / Whisper-Synthetic-ASR-Dataset-Generator

This UI serves as a Synthetic ASR Dataset Generator powered by/for OpenAI Whisper, enabling users to capture audio, transcribing it, on the fly and manage the generated dataset 🤗. Fine tune Whisper or enhanced and custom datasets
20 stars 0 forks source link

Include transcription availability for mp3 audios #3

Open kojomensahonums opened 2 months ago

kojomensahonums commented 2 months ago

The system currently does not work for files other than .wav files. I tried manually changing the extensions in the codebase but ended up with Error: Audio file does not contain RIFF_id. What changes need to be made to allow for mp3 or any other audio file type?

gongouveia commented 2 months ago

@kojomensahonums Hello, in the menu, I use the "wave" library. It has no support for MP3 (so the trick of adding the extension does not work, I think). Later today or at the end of the week, I will add support for MP3 files. In the meantime, you can change the code by contributing to the project using pydub or convert wav files to mp3 using FFmpeg, for example.

gongouveia commented 1 month ago

Hello @kojomensahonums, lately I have not had much time to do this feature, i would need much code refactor. You can use bath conversion of mp3 to wav files using https://ottverse.com/convert-all-files-inside-folder-ffmpeg-batch-convert/

kojomensahonums commented 1 month ago

@gongouveia Running the batch conversion command now. Working nicely, thank you.