Open benlower opened 1 month ago
Technically for all of these language requests, one can retrain / finetune the base model. With speech-to-speech, dataset seems a bigger moat then ever.
Can you please elaborate more on speech2speech part? Ultravox inferences text tokens and used tts system for speech no?
Technically for all of these language requests, one can retrain / finetune the base model. With speech-to-speech, dataset seems a bigger moat then ever.