I recently came across your paper presented at CVPR2024, where you introduced the OneLLM model. I found your work highly interesting and particularly relevant to my research. I am keen to conduct a detailed study on the scenarios where OneLLM processes inputs from multiple modalities, such as the audio-video-text cases described in Table 4 of your paper.
However, upon reviewing the resources available at this repository, I was unable to locate the scripts that handle experiments involving more than two modalities. I was wondering if you could kindly share the code for three-modality cases or guide me on how to proceed in setting up such experiments.
I would greatly appreciate any assistance or guidance you can provide on this matter. Thank you for your time, and I look forward to your response.
Hi @csuhan,
I recently came across your paper presented at CVPR2024, where you introduced the OneLLM model. I found your work highly interesting and particularly relevant to my research. I am keen to conduct a detailed study on the scenarios where OneLLM processes inputs from multiple modalities, such as the audio-video-text cases described in Table 4 of your paper.
However, upon reviewing the resources available at this repository, I was unable to locate the scripts that handle experiments involving more than two modalities. I was wondering if you could kindly share the code for three-modality cases or guide me on how to proceed in setting up such experiments.
I would greatly appreciate any assistance or guidance you can provide on this matter. Thank you for your time, and I look forward to your response.