akashmjn / tinydiarize

Minimal extension of OpenAI's Whisper adding speaker diarization with special tokens
MIT License
439 stars 14 forks source link

How to train a new model? #22

Open shell1986 opened 10 months ago

shell1986 commented 10 months ago

Hello, I am very far from teaching models, but at the same time I am very interested in trying to make a model for diarization of anime characters. I prepared a dataset based on ASS subtitles with highlighted characters. I wrote a script that, based on these subtitles, cuts out parts of phrases that correspond to tags in the subtitles and distributes them to different folders in the form of audio files. Now I want to understand how to train a model based on this data.

If you have links or examples of how to do this, please recommend them. I program in PHP and JS, sometimes I write something in C++.