Ahnsun / merlin

[ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds
https://ahnsun.github.io/merlin/
Other
84 stars 0 forks source link

To use vicuna model as text decoder #4

Open minjung98 opened 5 months ago

minjung98 commented 5 months ago

Hello,

Your project seems really interesting. I have a question regarding the execution of sh playground/merlin/clip-large+conv+vicuna-v15-7b/pretrain.sh. In the file, it says --model_name_or_path /path/models--lmsys--vicuna-7b-v15 \. If I want to use lmsys/vicuna-7b-v15 as the text decoder, do I need to download the models manually, place them in a specific path, and modify the path accordingly? Should I download all the files and place them in a specific path as shown in the picture below?

스크린샷 2024-06-25 오전 1 23 45

I would appreciate it if you could provide a guide on how to set up the vicuna-7b-v15 model.

And could you please let me know the required CUDA version to run this file? I encountered an error stating that the libcudart.so.12 file is missing, so I set up the environment with CUDA 12.0. However, I got an error indicating that the version does not match with flash-attn. When I switched to CUDA 11.7, I again encountered the missing libcudart.so.12 file error.

Thank you.

Ahnsun commented 5 months ago

Thanks for your attention. Yeah, you need to download the whole files of vicuna-v15. CUDA version is cuda 11.8.

minjung98 commented 5 months ago

Thank you for your kind response. I'll try again with cuda 11.8.