HITsz-TMG / UMOE-Scaling-Unified-Multimodal-LLMs

The codes about "Uni-MoE: Scaling Unified Multimodal Models with Mixture of Experts"
https://uni-moe.github.io/
728 stars 32 forks source link

Audio Understanding for Uni-MoE v2 #6

Open IanZ2020 opened 2 weeks ago

IanZ2020 commented 2 weeks ago

I found that Uni-MoE v2 is not trained on audio understanding tasks and not utilizing the BEATs audio encoder.

Is Uni-MoE v2 not designed for understanding general audio events, like natural sounds?

IanZ2020 commented 2 weeks ago

And also, does Uni-MoE v1 trained two separate MoE models for processing audio and speech respectively? Is there any way we can integrate both Uni-MoE-Audio and Uni-MoE-Speech?

expapa commented 2 weeks ago

Thank you for your attention to our work. We are currently working on resolving this issue, and such features will be introduced in future versions.