csuhan / OneLLM

[CVPR 2024] OneLLM: One Framework to Align All Modalities with Language
Other
587 stars 32 forks source link

Audio-Video-Text Evaluation Scripts are missing (Table 4 of OneLLM paper) #29

Open vittoriopipoli opened 4 weeks ago

vittoriopipoli commented 4 weeks ago

Hi @csuhan,

I recently came across your paper presented at CVPR2024, where you introduced the OneLLM model. I found your work highly interesting and particularly relevant to my research. I am keen to conduct a detailed study on the scenarios where OneLLM processes inputs from multiple modalities, such as the audio-video-text cases described in Table 4 of your paper.

However, upon reviewing the resources available at this repository, I was unable to locate the scripts that handle experiments involving more than two modalities. I was wondering if you could kindly share the code for three-modality cases or guide me on how to proceed in setting up such experiments.

I would greatly appreciate any assistance or guidance you can provide on this matter. Thank you for your time, and I look forward to your response.

qixueweigitbub commented 3 weeks ago

I have the same request. waiting for a simple demo script to run the model with audible video input.