Closed sipie800 closed 4 months ago
This is because we did not include such instructions as "reading out a certain paragraph" when doing SFT. Any2Any has too many tasks to take care of all of them. You can let the model generate speech directly through voice dialogue, or use the base model directly for TTS.
I tried TTS from this format,
interleaved|Read this sentence aloud, this is input: Today is a sunny day.|speech
it just texts back with "doesn't it", no audio is generated.
And besides, with some other attempts, it goes to "modality hallucinations". Draw a image, make a music. Just no TTS.
What is the prompt? Or it can't do with chat?