Open dudash opened 2 years ago
was thinking we could use NVIDIA RIVA. Did some brainstorming in Miro here: https://miro.com/app/board/uXjVOZLd2gQ=/
notionally we will: First, import the Riva API Next, create a gRPC channel to the Riva endpoint Then, create a ASR request
TODO - figure out if the stream format from RTC track can be sent directly to RIVA or a transformation is needed. TODO - figure out how to launch local RIVA server to support local TTS TODO - figure out where overlay text will appear? In chat log, in other part of CLI, do we need to refactor CLI UI (#5) first?
Possibly leverage this open source (not the cloud service - run locally or on our servers) https://opensource.googleblog.com/2019/08/bringing-live-transcribes-speech-engine.html