0nutation / SpeechGPT

SpeechGPT Series: Speech Large Language Models
https://0nutation.github.io/SpeechGPT.github.io/
Apache License 2.0
1.04k stars 64 forks source link

An Error about the description in the related works #2

Closed MingLunHan closed 1 year ago

MingLunHan commented 1 year ago

Dear authors,

Good evening!

I recently noticed your research paper "SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities" published on Arxiv.

However, there is an error in the description of our recent work X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages

Our design is not a cascade architecture that separates ASR and large language models and then connects them. Instead, we use a pre-trained acoustic encoder based on CIF (Continuous Integrate-and-Fire Mechanism) to connect with LLM through the S2L interface. Our work uses a representation-based end-to-end modeling approach for LLM-based speech recognition.

I hope you can make a correction to this issue as soon as possible to avoid any misunderstandings by the public about our work.

Thanks!

Minglun.

tensorboy commented 1 year ago

@MingLunHan

Fantastic work from both of you!

On another note, do you have an estimated timeline for when the code of X-LLM will be available to the public?

MingLunHan commented 1 year ago

@tensorboy

Hello!

We are currently busy with our doctor thesis defense and graduation-related matters. The code of X-LLM will be released as soon as possible within 1-2 weeks.

Thanks for your attention!

0nutation commented 1 year ago

Sorry for that. We have already corrected the mistake and uploaded the latest version of the paper to arXiv.

MingLunHan commented 1 year ago

@0nutation Thank you very much!