OOM issue for both CPU and GPU in 1B5 model training

Hello, I try to finetune the RWKV 1B5 model using this cmd: train.py --load_model ${FILE_DIR}/gpt_model/RWKV-4-Raven-1B5-v11.pth --wandb "" --proj_dir "out"\ --data_file ${FILE_DIR}/train.npy --data_type "numpy" --vocab_size 50277\ --ctx_len 1024 --epoch_steps 5 --epoch_count 5 --epoch_begin 0 --epoch_save 2 \ --micro_bsz 2 --n_layer 24 --n_embd 2048 --pre_ffn 0 --head_qk 0 \ --lr_init 1e-5 --lr_final 1e-5 --warmup_steps 0 --beta1 0.9 --beta2 0.999 --adam_eps 1e-8 \ --precision fp16 --strategy deepspeed_stage_2_offload --accelerator gpu --grad_cp 1 --devices 8

My GPUs are V100 has 32G GPU memory, 40G CPU memory for each machine. But I can only start training in 2 steps(I set more steps in 1 epoch, it will also be killed in second step) then the training was killed due to OOM in CPU memory if I use deepspped_stage_2_offload. Is this something wrong in my experiment setting, or I need to prepare more CPU memory for that?

Really appreciate your help.

BlinkDL / RWKV-LM

OOM issue for both CPU and GPU in 1B5 model training #117