Closed Leosgp closed 6 months ago
Thank you for your interest in PiSSA, Could you provide more detailed error information? Additionally, if you need to train on multiple GPUs, it is recommended to include these two commands in your script, which will help reduce memory required:
--fsdp "full_shard auto_wrap" \
--fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
Thank you for replying, and this is the error information :
File "train.py", line 216, in
this time I use the tiny-llama for one A100,also ,it report the memory issue
Please use FSDP full_shard mode like:
torchrun --nproc_per_node=4 --master_port=
Hello, I would like to know why I use pissa for tiny-llama or llama-7b three A100s, why it shows insufficient video memory? I do the same with tiny-llama, here are my parameters: CUDA_VISIBLE_DEVICES=3,5,7 yes | head -n 3 | python train.py \ --model_name_or_path /home/algo/pretrain_model/TinyLlama-1.1B-Chat-v1.0 \ --data_path data/train-00000-of-00005-a1278ede4e8c5cdb.json \ --dataset_split train[:10000] \ --dataset_field instruction output \ --output_dir /home/algo/zhengkaiyuan/b \ --init_lora_weights pissa \ --report_to wandb \ --merge_and_save True \ --bf16 True \ --num_train_epochs 1 \ --per_device_train_batch_size 1 \ --gradient_accumulation_steps 12 \ --save_strategy "steps" \ --save_steps 10000 \ --save_total_limit 1 \ --learning_rate 2e-5 \ --weight_decay 0. \ --warmup_ratio 0.03 \ --lr_scheduler_type "cosine" \ --logging_steps 1 \ --tf32 True