how to run SpeechGPT on mixed audio+text benchmark data

Hi, thanks a lot for sharing your benchmark data and code!

I found SpeechGPT in your leaderboards and would like to ask you whether you have run this model yourself on your benchmark. If so, could you let me know or point me to code, how you made SpeechGPT process mixed modality input as required by your benchmark? With mixed modality input I mean, that the model gets an audio signal plus text instructions. See also this not yet answered issue in the SpeechGPT project. Any hint appreciated. Thanks a lot!

OFA-Sys / AIR-Bench

how to run SpeechGPT on mixed audio+text benchmark data #5