Closed ZhikangNiu closed 9 months ago
We haven't tested the ASR performance of SpeechGPT on a standard dataset. The performance of the 7B model is still relatively not perfect, and its robustness on tasks like ASR is not satisfactory. Regarding the error output, due to limitations in the training data, the model may misidentify tasks, such as mistaking an ASR task for a speech dialogue task. Special tokens like [ua] and [ta] stand for 'unit answer' and 'text answer' respectively. This design is part of our chain-of-modality, and you can refer to the paper and cases of SpeechInstruct chain-of-modality datase for more details.
Thanks for your answer, SpeechGPT is a promising work. Hope you can release the 13B model as soon as possible. But I also find the model didn't response (I mean response in empty), this is not a rare phenomenon. Besides, when we use the model for asr task, we don't want special token to appear. How do I avoid these special situations?
If you want to use speechgpt for asr or tts tasks, it is recommended to use only the SpeechGPT-7B-CM and no SpeechGPT-7B-com adatper.
If you want to use speechgpt for asr or tts tasks, it is recommended to use only the SpeechGPT-7B-CM and no SpeechGPT-7B-com adatper.
thx for your answer, I will test and share you results
Thanks for your amazing work and I want to know if you have tested the ASR Task evaluation metrics (eg: wer) and other metrics that can be quantified? And when I use ASR task, I also find some error ouput, for example: