Open Raven625 opened 1 week ago
The same question, is there a quantitative way of reasoning
same issue
Offload some layers of the visual tokenizer to the CPU using a device map. I use this function to generate the device map:
def makeDeviceMap(llmGpuLayers: int, visGpuLayers: int) -> dict:
llmGpuLayers = min(llmGpuLayers, 41)
visGpuLayers = min(visGpuLayers, 26)
deviceMap = dict()
cpu = "cpu"
cuda = 0
deviceMap["llm.model.embed_tokens"] = cuda
deviceMap["llm.model.norm"] = cuda
deviceMap["llm.lm_head.weight"] = cuda
deviceMap["vte.weight"] = cuda
deviceMap["llm.model.layers.0"] = cuda
for l in range(1, llmGpuLayers):
deviceMap[f"llm.model.layers.{l}"] = cuda
for l in range(llmGpuLayers, 41):
deviceMap[f"llm.model.layers.{l}"] = cpu
deviceMap["llm.model.layers.41"] = cuda
deviceMap["visual_tokenizer"] = cuda
deviceMap["visual_tokenizer.backbone.vision_model.encoder.layers.0"] = cuda
for l in range(1, visGpuLayers):
deviceMap[f"visual_tokenizer.backbone.vision_model.encoder.layers.{l}"] = cuda
for l in range(visGpuLayers, 26):
deviceMap[f"visual_tokenizer.backbone.vision_model.encoder.layers.{l}"] = cpu
deviceMap["visual_tokenizer.backbone.vision_model.encoder.layers.26"] = cuda
# print("mkDeviceMap:")
# for k, v in device_map.items():
# print(f"{k} -> {v}")
return deviceMap
It works on my 4090 with arguments of 41 and 6:
self.model = AutoModelForCausalLM.from_pretrained(
modelPath,
torch_dtype=torch.bfloat16,
multimodal_max_length=8192,
#attn_implementation='flash_attention_2',
device_map=self.makeDeviceMap(41, 6),
trust_remote_code=True
)
I run their HF demo snippet (https://huggingface.co/AIDC-AI/Ovis1.6-Gemma2-9B) on 3090 without issues. Ubuntu, ~500Mb VRAM in use before loading the model. ~21.7Gb during inference.
And it is very good!
Could anyone please advise if it is possible to run inference with OVIS 1.6 on a single 4090 GPU? After loading the model, it appears to consume approximately 20GB of VRAM. I attempted an inference, but the demo exited due to insufficient memory. Are there any solutions to this issue?