请教一下关于SFT的问题

Coobiw / MPP-LLaVA

Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conversations}. Don't let the poverty limit your imagination! Train your own 8B/14B LLaVA-training-like MLLM on RTX3090/4090 24GB.

349 stars 19 forks source link

请教一下关于SFT的问题 #26

Open df2046df opened 2 months ago

df2046df commented 2 months ago

我在运行SFT时出现了系统内存不足的情况： RuntimeError: [enforce fail at alloc_cpu.cpp:83] err == 0. DefaultCPUAllocator: can't allocate memory: you tried to allocate 15132180480 bytes. Error code 12 (Cannot allocate memory) 请问这种问题可能是什么原因导致的呢，初次接触SFT，对这方面不太懂，想请您解答一下

Coobiw commented 2 months ago

你应该是运行的7B版本的sft吧，我在sft load模型的时候，模型会先load到cpu，然后再pipeline parallel到对应的GPU上，你这里我感觉应该是CPU内存不够，存不下半精度的7～8B的模型（7～8B模型半精度，约需要14～16GB的存储空间）

df2046df commented 2 months ago

你应该是运行的7B版本的sft吧，我在sft load模型的时候，模型会先load到cpu，然后再pipeline parallel到对应的GPU上，你这里我感觉应该是CPU内存不够，存不下半精度的7～8B的模型（7～8B模型半精度，约需要14～16GB的存储空间）

那可以做到把模型直接load到gpu上吗，我这边的cpu可能达不到这个要求

Coobiw commented 2 months ago

https://github.com/Coobiw/MiniGPT4Qwen/blob/master/lavis/models/minigpt4qwen_models/minigpt4qwen.py#L150

把这里hard code成device_map="cuda"试一下吧