ashawkey / RAD-NeRF

Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition
MIT License
878 stars 153 forks source link

chartGPT with RAD-NeRF #20

Open boolw opened 1 year ago

boolw commented 1 year ago

I'm trying to drive the model with the text that chartGPT replies. But it has a long calling path:

text->TTS->wav->ASR->npy(logits/text)->RAD-NeRF

Such conversions seem inefficient and redundant.

Is there a better way to make it simpler and more efficient? ?

text->[???]->RAD-NeRF

For example, is it possible to use the Mel Spectrum output from TTS's am model to train or test the model? ?

ashawkey commented 1 year ago

This is reasonable since there are works using spectrum as input, but may need some experiments to verify.

aishoot commented 1 year ago

Nice work. I'm also trying it.

exceedzhang commented 1 year ago

good idea!