ashawkey / RAD-NeRF

Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition
MIT License
861 stars 151 forks source link

Running the code in stream mode #14

Open pegahs1993 opened 1 year ago

pegahs1993 commented 1 year ago

How can we have a stream using asr.py ? So that the audio file does not need to be input as a file. For example, my words can be repeated in real-time by Talking-Head.

Thanks a lot @ashawkey

ashawkey commented 1 year ago

@tylersky1993 Hi, unfortunately the current streaming mode is not performing well, since the ASR model we use is not specifically designed for real-time ASR (it requires at least 1 second input to make good prediction). If you really want to try the streaming mode, you could find an example code here. Note that the GUI mode is necessary and the ASR sliding window (-l, -m, -r) is smaller for lower latency.

amuvarma13 commented 11 months ago

@ashawkey do you think chopping up the audio and feeding it second by second could work