Open ehosseiniasl opened 1 month ago
Hello, part of the prompt in this file was used for debugging. I suggest you refer to this place https://github.com/OpenMOSS/AnyGPT/blame/6404dbafccc10943be6bf6e24a4b99b3a6545501/anygpt/src/m_utils/prompter.py#L113
So actually for voice commands and voice replies, we use the prompt of 'Speech-Instruction'
thanks. Did you have direct speech response generation (without text response generation) for base or chat model? which speech response tasks are included in instruction tuning?
using Speech-Instruction
on chat model, response is as bellow. to_modality=speech
Could you please explain what is the first line? : <-Res-> Gmarin misway"- How beautiful you look today!
does the model first generates text reply, then speech, even if output modality is speech only?
response:
: <-Res-> Gmarin misway"- How beautiful you look today!
[AnyGPT] "Guhmyayayay!" - How beautiful you look today! <sosp> <🗣️691> <🗣️691> <🗣️60> <🗣️868> <🗣️868> <🗣️906> <🗣️316> <🗣️1015> <🗣️965> <🗣️512> <🗣️512> <🗣️223> <🗣️223> <🗣️689> <🗣️35> <🗣️35> <🗣️35> <🗣️962> <🗣️57> <🗣️943> <🗣️699> <🗣️1> <🗣️118> <🗣️118> <🗣️118>
does the prompt include user speech transcription? the sentence after <-Res->
is the transcription of speech instruction I provided
does the prompt include user speech transcription? the sentence after
<-Res->
is the transcription of speech instruction I provided
Hello, we provide some training data samples and related descriptions, please refer to https://github.com/OpenMOSS/AnyGPT?tab=readme-ov-file#pretraining-and-sft
In the voice dialogue mode, the user provides voice commands, the model recognizes the text commands, generates text replies, and finally generates the voice of the reply.
https://github.com/OpenMOSS/AnyGPT/blame/6404dbafccc10943be6bf6e24a4b99b3a6545501/anygpt/src/m_utils/prompter.py#L45
Hello, Is this line correct? Is this for speech-to-speech conversation? In that case, isn't this the correct prompt: