Closed MingLunHan closed 1 year ago
@MingLunHan
Fantastic work from both of you!
On another note, do you have an estimated timeline for when the code of X-LLM will be available to the public?
@tensorboy
Hello!
We are currently busy with our doctor thesis defense and graduation-related matters. The code of X-LLM will be released as soon as possible within 1-2 weeks.
Thanks for your attention!
Sorry for that. We have already corrected the mistake and uploaded the latest version of the paper to arXiv.
@0nutation Thank you very much!
Dear authors,
Good evening!
I recently noticed your research paper "SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities" published on Arxiv.
However, there is an error in the description of our recent work X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages
Our design is not a cascade architecture that separates ASR and large language models and then connects them. Instead, we use a pre-trained acoustic encoder based on CIF (Continuous Integrate-and-Fire Mechanism) to connect with LLM through the S2L interface. Our work uses a representation-based end-to-end modeling approach for LLM-based speech recognition.
I hope you can make a correction to this issue as soon as possible to avoid any misunderstandings by the public about our work.
Thanks!
Minglun.