0nutation / SpeechGPT

SpeechGPT Series: Speech Large Language Models
https://0nutation.github.io/SpeechGPT.github.io/
Apache License 2.0
1.04k stars 64 forks source link

some question about cross-model instruction data #14

Closed ZhikangNiu closed 9 months ago

ZhikangNiu commented 9 months ago

image the paper mentioned Instruction format is: [Human]:{D}. This is input: {U} . [SpeechGPT]: {T}. but the project page mentioned the struction format is

[
    {
        "prefix": "You are an AI assistant whose name is SpeechGPT.\n- SpeechGPT is a intrinsic cross-modal conversational language model that is developed by Fudan University.  SpeechGPT can understand and communicate fluently with human through speech or text chosen by the user.\n- It can perceive cross-modal inputs and generate cross-modal outputs.\n",
        "plain_text": "[Human]: Try to speak out this sentence, please. This is input: The alchemist rode in front, with the falcon on his shoulder.<eoh> [SpeechGPT]: <sosp><661><588><604><157><596><499><596><106><596><189><63><189><665><991><162><202><393><946><327><905><907><597><660><351><557><794><788><59><754><12><977><877><333><873><835><67><940><118><686><613><169><72><644><553><535><935><101><741><384><173><894><787><380><787><196><555><721><944><250><56><812><222><915><143><390><479><330><435><647><246><650><816><325><506><686><208><613><417><755><193><411><452><111><735><6><735><63><665><644><991><535><271><333><196><918><29><202><393><946><734><390><479><330><776><167><761><907><597><660><351><557><794><75><788><15><366><896><627><168><654><659><177><183><609><710><187><493><361><470><821><59><56><198><912><742><840><431><531><76><668><576><803><791><380><660><325><801><549><366><377><164><309><584><605><193><71><39><eosp><eoa> "
    },
]

I found the instrction format is: [Human]:{D}. This is input: {T} . [SpeechGPT]: {U}. ?