微调qwen爆内存 - Githubissues

TideDra / VL-RLHF

A RLHF Infrastructure for Vision-Language Models

Apache License 2.0

85 stars 5 forks source link

Open delian11 opened 2 months ago

delian11 commented 2 months ago

您好，使用原始代码在2张A100 80G上面微调qwen，显存占用两张卡上都只有919M，但是在数据加载过程中？内存占用一直在增加，直到180多G后内存爆了，程序终止。请问这个问题怎么解？训练log：

内存占用：

TobiasLee commented 1 month ago

多大的qwen?

delian11 commented 1 month ago

多大的qwen?

qwen-vl, 7b

TobiasLee commented 1 month ago

bsz 可以调一下？他的词表有 100k 左右所以最后的activation很大，bsz=1 看看能不能跑起来吧，我记得 80G 是可以跑到 per_device_batch_size=4 的，然后调 gradient_accumulation_step 来保证 global_batch_size