Train Own Voice for "Speaker 0" Naming

BasedHardware / omi

AI wearables

https://omi.me

MIT License

3.68k stars 473 forks source link

Train Own Voice for "Speaker 0" Naming #173

Closed ThatGuySizemore closed 5 months ago

ThatGuySizemore commented 6 months ago

As a user, I want to be able to train Deepgram's transcription functions to identify my own voice so that anything I speak is labeled with my name and summaries are able to identify what I said compared to other people.

after-ephemera commented 6 months ago

Thanks for the feedback @ThatGuySizemore. Does deepgram offer this functionality right now? I'd love to see a proof of concept of what it might look like to train and use a customized model.

ThatGuySizemore commented 6 months ago

@after-ephemera - Looks like Deepgram's Diarization doesn't support it, unlike Whisper. Wonder if there is a different solution to identify "self" in the transcriptions. The summaries require some decoding a bit to understand what precisely happened. For example, when someone chats with memories, saying "Are there any tasks I agreed to do today?" it typically won't work due to a lack of identification. Rarely, though, Deepgram can use context clues in a conversation to identify "self" vs. other people. Granted, it's hit or miss.

ThatGuySizemore commented 6 months ago

@after-ephemera I'm doing some additional digging and working with some of the devs over at Deepgram to understand the API better.

josancamon19 commented 5 months ago

Hey @ThatGuySizemore thanks for pointing this out, actually deepgram doesn't work well with this, we are trying to build this on our own, will keep you updated :)

bbookman commented 5 months ago

is this not now implemented?