Is this repo usable for a production use case!!

MahmoudAshraf97 / whisper-diarization

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

BSD 2-Clause "Simplified" License

2.44k stars 238 forks source link

Is this repo usable for a production use case!! #158

Open utility-aagrawal opened 5 months ago

utility-aagrawal commented 5 months ago

Hi All,

I am wondering if anyone has used this repo for a production use case. Currently, I am using openai whisper for transcription but want to include speaker diarization now. I have tried pyannote in the past but results from this repo look much better. My concern is that the source code hasn't been written keeping a production use case in mind - not too flexible, too many log messages, etc. I can rewrite this code but what if there were updates in the future. Will appreciate the community's input on this. Thanks!

utility-aagrawal commented 5 months ago

@MahmoudAshraf97 , will appreciate your take on this! Thanks for sharing your work!

MahmoudAshraf97 commented 5 months ago

Hello and thanks for the input، please open a PR with any changes you see that are useful and we can discuss them together

utility-aagrawal commented 5 months ago

@MahmoudAshraf97 , Thanks for your understanding! This is what I want to do:

1) Leave existing functionalities as-is.

2) Please see the attached .txt file. Currently, a lot of messages/warnings/logs are displayed in command line, I want to make this optional where users can choose if they want to see these messages. whisper_diarization_stdout.txt

3) If users want, they should be able to run the whole pipeline locally. Meaning that they can download all the models in a directory beforehand. Faster-whisper and whisperX load_align_model already have support for this. I can check if other models can also be used in this way. Do you know if this is feasible? What other models are used in this pipeline? I still have to go through the code and don't have this answer yet.

4) Format the code for readability and usability.

Let me know what you think. It will take some time to make all these changes. Before I spend any time, I wanted to align with you. Thanks!

utility-aagrawal commented 5 months ago

@MahmoudAshraf97 , do you have any feedback?

utility-aagrawal commented 4 months ago

@MahmoudAshraf97 , thought?

aedocw commented 2 months ago

I'm not speaking for @MahmoudAshraf97 here, but if you take a look at his response from Jan 24, it's pretty clear. This is an open source project that he's doing for whatever his reasons are. @utility-aagrawal, you are treating it like a commercial product that you are paying for.

If you want these changes, you are free to implement them and submit the PR's to get them merged into the project. If you are not a developer, you could pay someone to do the work and submit the patches.

transcriptionstream commented 2 months ago

I have this running in a production environment - it’s stable, consistent, and does a great job