Open dshwei opened 4 months ago
有按照README安装flash attention吗?
I installed flash attention, but I'm still getting an OutOfMemoryError with codes-7b model.
I installed flash attention, but I'm still getting an OutOfMemoryError with codes-7b model.
Hi, just installing flash attention is not enough, please modify the source code of transformers
to replace the original attention layer with the flash attention layer.
Here is the instruction to modify the source code: https://github.com/RUCKBReasoning/codes?tab=readme-ov-file#option-1-modify-the-source-code
I installed flash attention, but I'm still getting an OutOfMemoryError with codes-7b model. I also encountered this problem. Have your issues been resolved?
执行脚本:
!/bin/bash
SBATCH --job-name=sft_sql_codes # name
SBATCH --nodes=1 # nodes
SBATCH -w wuhan-gpu-[17]
SBATCH --ntasks-per-node=1 # crucial - only 1 task per dist per node!
SBATCH --cpus-per-task=8 # number of cores per tasks
SBATCH --gres=gpu:8 # number of gpus
SBATCH --gpus-per-task=8 # number of gpus
export GPUS_PER_NODE=8
export MASTER_ADDR=$(scontrol show hostnames $SLURM_JOB_NODELIST | head -n 1) export MASTER_PORT=9901 #
'
运行日志结果部分日志:
The following values were not passed to
accelerate launch
and had defaults used instead:--num_processes
was set to a value of8
More than one GPU was found, enabling multi-GPU training. If this was unintended please pass in--num_processes=1
.--num_machines
was set to a value of1
--mixed_precision
was set to a value of'no'
--dynamo_backend
was set to a value of'no'
To avoid this warning pass in values for each of the problematic parameters or runaccelerate config
. accelerator.is_main_process: True accelerator.device: cuda:0 Namespace(per_device_train_batch_size=4, block_size=4096, seed=42, pretrained_model_name_or_path='seeklhy/codes-7b', epochs=4, lr=5e-06, warmup_ratio=0.05, checkpointing_steps=100000, tensorboard_log_dir='./train_logs', mode='sft', output_ckpt_dir='./ckpts/codes-7b-bird-with-evidence', save_all_states=False, pt_data_dir='./data/corpus.bin', resume_from_checkpoint=None, resume_tag=None, text2sql_data_dir='./sft_bird_with_evidence_train_text2sql.json', table_num=6, column_num=10) tokens per batch: 131072 sequences per batch: 32 using LLM from: seeklhy/codes-7b accelerator.is_main_process: False accelerator.device: cuda:1 accelerator.is_main_process: False accelerator.device: cuda:5 accelerator.is_main_process: False accelerator.device: cuda:2 accelerator.is_main_process: False accelerator.device: cuda:3 accelerator.is_main_process: False accelerator.device: cuda:6 accelerator.is_main_process: False accelerator.device: cuda:7 accelerator.is_main_process: False accelerator.device: cuda:4torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 8.00 GiB. GPU 0 has a total capacty of 79.33 GiB of which 5.53 GiB is free. Including non-PyTorch memory, this process has 73.78 GiB memory in use. Of the allocated memory 66.49 GiB is allocated by PyTorch, and 6.31 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF