Closed Mixomo closed 2 weeks ago
Oh, and kind of off topic but: I saw that you mentioned fish speech, and it's an AI that I'm really interested in training locally, but I can't manage to do it. The documentation is pretty poor and confusing. Do you plan to make a small guide in the future? Have you had good results?
Oh yeah I havent implemented an option to use whisper locally yet. You can skip the transcription phase entirely if you just want to use whipser to get the data in the right format and then you just put the wav files and metadata.csv in the correct folder and go from there. On the command line it will ask you in the beginning if you want to skip transcription. just type "y".
as for fish speech. Yeah their finetuning process was a bit different as they dont really mention a lot of things. There was a hidden webui they seem to of forgot about or something. But it essentially walks you through the fine tuning process. . Ultimately I only briefly tested it and made a small lora/model. I think they still have some room for refinement so I was going to return to it later once they iron out some issues. Lets see...
so its at fish_speech/webui/manage.py python manage.py
you just need to put your data in the right format. which is the wav files go in the data folder then each wav file has a pair file.lab which is just like a txt file but with a different extension. so its like:
audiofile1.wav audiofile1.lab
audiofile2.wav audiofile2.lab
etc..
Oh yeah I havent implemented an option to use whisper locally yet. You can skip the transcription phase entirely if you just want to use whipser to get the data in the right format and then you just put the wav files and metadata.csv in the correct folder and go from there. On the command line it will ask you in the beginning if you want to skip transcription. just type "y".
as for fish speech. Yeah their finetuning process was a bit different as they dont really mention a lot of things. There was a hidden webui they seem to of forgot about or something. But it essentially walks you through the fine tuning process. . Ultimately I only briefly tested it and made a small lora/model. I think they still have some room for refinement so I was going to return to it later once they iron out some issues. Lets see...
so its at fish_speech/webui/manage.py python manage.py
you just need to put your data in the right format. which is the wav files go in the data folder then each wav file has a pair file.lab which is just like a txt file but with a different extension. so its like:
audiofile1.wav audiofile1.lab
audiofile2.wav audiofile2.lab
etc..
I appreciate your insightful response. Yes, I have the same opinion as you about fish speech. And as for the webUI, by carefully following the steps in their limited documentation, I believe I have managed to access the webUI. I don't know if it's the hidden webui you're referring to, as it seems that there is a webui for training and another for inference. I'll send a screenshot of the webui in a few hours.
On the other hand, no matter what I did, whether it was reinstalling fish speech in a clean way, I always get this error which I mention here: https://github.com/fishaudio/fish-speech/issues/601 His best answer was to basically remove Python from my system lol, what's the point of installing fish in a virtual environment which is supposedly isolated from conflicts with other libraries? 🤨
You can try pip install --upgrade lightning but I doubt this works.
If youre on windows, I would be using WSL2 with essentially all AI development tasks. The error youre getting looks like it is OS specific and just using wsl2 will ease a lot of headache. Its very simple to install and I use it all the time. At that point you'll just follow all linux installation instructions. You dont have to do anything special and theres rarely anything extra required. You essentially have a linux machine and can treat it as such, when using wsl2.
Hi, while reading the discussions on all talk I came across this repository. It looks very interesting and promising. Like you, I have a 3090, and I would like to use whisper locally. What should I change in the code (I guess in step 1?) to use whisper locally instead of the deepgram api? Thanks!