Carsonthemonkey / GIST

App to summarize audio files for the LC ACM spring 2023 hackathon
MIT License
3 stars 0 forks source link

Support a more feature rich transcription API #26

Closed Carsonthemonkey closed 1 year ago

Carsonthemonkey commented 1 year ago

Speaker diarization would be pretty cool, so this could be something we look into. I think there is a fork of OpenAI's whisper API (Which we currently use) that could be used to do this, but It may only work in Python. If so we might have to do some backend stuff which I am scared of. Although we may be able to get it running locally instead of through an API call which would be pretty cool. There is also google cloud speech-to-text which has diarization but would require a separate API key and I'm not sure if it supports translation as well as whisper does.