Open gksoriginals opened 2 months ago
@bsbarkur thoughts on this ?
Sure, the proper way to do this perhaps is:
And try out both of these modes above! Person should have freedom to select one of these above.
Yes. In the first case we can collect some irregular speech data in privacy preserved way to train the models for them in a later stage
https://sites.research.google/euphonia/about/ https://www.isca-archive.org/interspeech_2021/macdonald21_interspeech.pdf
References to read for irregular speech data training @gksoriginals
@maximaminima I was thinking about a postprocessing layer using an LLM to fix the sentences.
@maximaminima I was thinking about a postprocessing layer using an LLM to fix the sentences.
This seems like a good + quick solution. We just need to ensure that there is an option which can make sure the LLM identifies the speaker language. Would it work?
@ambikajo but we need to finetune ASR models for it to understand the irregular speech of a deaf person. I guess this is a bit complex. Speaker language identification most of the ASR models/apis do.
Ok I understand what you mean now. I can do an ASR eval if you have some samples with the transcriptions (on Gooey)
The deaf person will also try to speak sometimes. We need a model to figure out if the transcription is poor quality or not and if it is below a particular threshold we should not show it.