collabinator / clivrt

:busts_in_silhouette: :computer: a CLI app that provides real-time video chat. (clivrt = CLI Video Real Time)
Apache License 2.0
3 stars 1 forks source link

Overlay audio with autogenerated closed captions (aka TTS) #2

Open dudash opened 2 years ago

dudash commented 2 years ago

Possibly leverage this open source (not the cloud service - run locally or on our servers) https://opensource.googleblog.com/2019/08/bringing-live-transcribes-speech-engine.html

dudash commented 2 years ago

was thinking we could use NVIDIA RIVA. Did some brainstorming in Miro here: https://miro.com/app/board/uXjVOZLd2gQ=/

dudash commented 2 years ago

notionally we will: First, import the Riva API Next, create a gRPC channel to the Riva endpoint Then, create a ASR request

TODO - figure out if the stream format from RTC track can be sent directly to RIVA or a transformation is needed. TODO - figure out how to launch local RIVA server to support local TTS TODO - figure out where overlay text will appear? In chat log, in other part of CLI, do we need to refactor CLI UI (#5) first?

dudash commented 2 years ago

https://docs.nvidia.com/deeplearning/riva/user-guide/docs/asr/asr-overview.html#streaming-recognition