PaddlePaddle / PaddleMIX

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.
Apache License 2.0
301 stars 117 forks source link

sft全量微调多卡报错 #577

Open fangfangssj opened 3 months ago

fangfangssj commented 3 months ago
c4ba535a46cf100854d633df508a21d

环境使用的为paddle2.6.1docker镜像(nvidia-docker pull registry.baidubce.com/paddlepaddle/paddle:2.6.1-gpu-cuda12.0-cudnn8.9-trt8.6),pip安装requirements.txt,硬件环境为nvidiaA100*4

多卡训练的时候会报错,但单卡训练的时候是正常的

LokeZhou commented 2 months ago

这种一般是显存不够