Open boolw opened 1 year ago
I'm trying to drive the model with the text that chartGPT replies. But it has a long calling path:
text->TTS->wav->ASR->npy(logits/text)->RAD-NeRF
Such conversions seem inefficient and redundant.
Is there a better way to make it simpler and more efficient? ?
text->[???]->RAD-NeRF
For example, is it possible to use the Mel Spectrum output from TTS's am model to train or test the model? ?
This is reasonable since there are works using spectrum as input, but may need some experiments to verify.
Nice work. I'm also trying it.
good idea!
I'm trying to drive the model with the text that chartGPT replies. But it has a long calling path:
text->TTS->wav->ASR->npy(logits/text)->RAD-NeRF
Such conversions seem inefficient and redundant.
Is there a better way to make it simpler and more efficient? ?
text->[???]->RAD-NeRF
For example, is it possible to use the Mel Spectrum output from TTS's am model to train or test the model? ?