Luodian / Otter

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
https://otter-ntu.github.io/
MIT License
3.54k stars 242 forks source link

About GPU memory? #249

Open zuwenqiang opened 1 year ago

zuwenqiang commented 1 year ago

Hi! I'm using two A100 GPUs, each with 40GB of memory. This is the GPU memory utilization for my training. I'm almost reaching over 90% memory utilization on both A100 GPUs. image

Is this normal? Here's the configuration: accelerate launch --config_file=./pipeline/accelerate_configs/accelerate_config_fsdp.yaml \ pipeline/train/instruction_following.py \ --pretrained_model_name_or_path=luodian/OTTER-LLaMA7B-INIT \ --mimicit_path="path/to/SN_instruction.json" \ --images_path="path/to/SN.json" \ --batch_size=4 \ --num_epochs=9 \ --report_to_wandb \ --wandb_entity=ntu-slab \ --run_name=OTTER-LLaMA7B-densecaption \ --wandb_project=OTTER-LLaMA7B \ --workers=1 \ --lr_scheduler=cosine \ --learning_rate=1e-5 \ --warmup_steps_ratio=0.01

and here's the accelerate config(accelerate_config_fsdp.yaml): compute_environment: LOCAL_MACHINE distributed_type: no downcast_bf16: false machine_rank: 0 main_training_function: main mixed_precision: fp16 num_machines: 1 num_processes: 1 rdzv_backend: static same_network: false tpu_use_cluster: false tpu_use_sudo: false use_cpu: false main_process_port: 20687

zuwenqiang commented 1 year ago

And during training, the system prompts me with "Using dtype torch.float32."

Luodian commented 1 year ago

I think the GPU usage looks normal but you could try with deepspeed_zero2 config and set the num_processes=2.

Luodian commented 1 year ago

using zero2 will be much faster than fsdp if you have 80G A100.