ai-computing / aicomp

Other
6 stars 0 forks source link

CPU memory OOM during large model training #16

Open ememos opened 2 months ago

ememos commented 2 months ago

When 8 processes are launched on a single server using torchrun, and each process executes from_pretrained() for a GPT-J 6B scale model, a CPU memory level OOM occurs.

ememos commented 2 months ago

The logic to handle CPU memory OOM has been incorporated into the code during the Hugging Face model parameter download stage (utilizing the cache_dir option and barrier at the example code level). Additionally, logic to handle CPU memory OOM during the IR analysis phase has been included in the engine level(IR_Anal: SEQUENTIAL/PARALLEL/SINGLE options). SEQUENTIAL is the most memory-efficient, while SINGLE is in an experimental state.