EvolvingLMMs-Lab / lmms-eval

Accelerating the development of large multimodal models (LMMs) with lmms-eval
https://lmms-lab.github.io/
Other
1.03k stars 53 forks source link

[Fix] repr llava doc #36

Closed cocoshe closed 1 month ago

cocoshe commented 3 months ago

When reproduce the llava model evaluation, some errors may occur like this issue:

This is because we don't pass the device_map according to the repr script here: https://github.com/EvolvingLMMs-Lab/lmms-eval/blob/9dfb53afb49adb6d20607bc3ca989591969f5432/miscs/repr_scripts.sh#L14

which leads to the error in llava model code: https://github.com/EvolvingLMMs-Lab/lmms-eval/blob/9dfb53afb49adb6d20607bc3ca989591969f5432/lmms_eval/models/llava.py#L68-L73

It seems that when num_processes is >1, it run into the if branch to select a specific cuda device, cuda:0 for example. and if the num_processes is 1 (that's how we do according to the reproduce script), it need user to define a device_map strategy, auto for example. And if we don't pass the device_map, it's default value is ""(empty).

So, the logic is really weird, actually when we try to use multi-gpu, we need to set a device_map, and if we only use one gpu(num_processes=1), it should be located on a specific cuda device.

BTW, there's also a typo: --log_samples_sufix and --log_samples_suffix in the script doc.

kcz358 commented 3 months ago

Hi @cocoshe , this is intended here so that you can enable tensor parallel when accelerate process is 1. When you use process > 1, the accelerator handle the device and distributed the model on each device.

cocoshe commented 3 months ago

Hi @cocoshe , this is intended here so that you can enable tensor parallel when accelerate process is 1. When you use process > 1, the accelerator handle the device and distributed the model on each device.

Tks for your reply~, but I'm curious if we set num_processes > 1(which means we want to use multi gpus?), since the device_map is set to be something like cuda:0, can we load the model with multi gpus by passing such a specific device_map to from_pretrained?

cocoshe commented 3 months ago

Hi @cocoshe , this is intended here so that you can enable tensor parallel when accelerate process is 1. When you use process > 1, the accelerator handle the device and distributed the model on each device.

I think I know what you mean, when the num_processes = 1, it means we would like to pass the device_map to decide the strategy my ourselves manually, and when the num_processes > 1, it means we would like to use accelerate to handle the strategy automatically, so we don't need to pass the device_map. Is that how you design the code?

kcz358 commented 3 months ago

Hi @cocoshe , this is intended here so that you can enable tensor parallel when accelerate process is 1. When you use process > 1, the accelerator handle the device and distributed the model on each device.

I think I know what you mean, when the num_processes = 1, it means we would like to pass the device_map to decide the strategy my ourselves manually, and when the num_processes > 1, it means we would like to use accelerate to handle the strategy automatically, so we don't need to pass the device_map. Is that how you design the code?

Yes, when we set num_process > 1, each device will hold one model weights and run parallel. Accelerate will handle the device index for you. If you do not have enough memory to hold the model, then you can set num_process = 1 and device map = auto to enable tensor parallel or any other device map you want

cocoshe commented 3 months ago

Hi @cocoshe , this is intended here so that you can enable tensor parallel when accelerate process is 1. When you use process > 1, the accelerator handle the device and distributed the model on each device.

I think I know what you mean, when the num_processes = 1, it means we would like to pass the device_map to decide the strategy my ourselves manually, and when the num_processes > 1, it means we would like to use accelerate to handle the strategy automatically, so we don't need to pass the device_map. Is that how you design the code?

Yes, when we set num_process > 1, each device will hold one model weights and run parallel. Accelerate will handle the device index for you. If you do not have enough memory to hold the model, then you can set num_process = 1 and device map = auto to enable tensor parallel or any other device map you want

Tks for your explanation and great work~ And I just committed for the reproduce doc, including a typo and the device_map args is needed since the num_processes is 1 in the script, or there will be a error like the issue I mentioned before, and you also helped him to solve the problem:

So it is better to add the device_map to the script, or it may confuse some followers if they don't know much about accelerate and transformers library.