Why the training time is so long

JayZhang42 / FederatedGPT-Shepherd

Shepherd: A foundational framework enabling federated instruction tuning for large language models

https://arxiv.org/pdf/2305.05644.pdf

Apache License 2.0

199 stars 31 forks source link

Why the training time is so long #7

Open mid2doubao opened 6 months ago

mid2doubao commented 6 months ago

I use the command below with two NVIDIA TITAN RTXs, it needs 20+ hours to get the model trained. python main.py --global_model 'chavinlo/alpaca-native'\ --data_path "./data" \ --output_dir './lora-shepherd-7b/'\ --num_communication_rounds 10 \ --num_clients 10 \ --train_on_inputs \ --group_by_length

lxr-1204 commented 6 months ago

I trained using the same code as you, using an RTX 3090 24G for training. It took approximately 14 hours, and the GPU memory usage was around 14G, not the 23G mentioned in the paper. May I ask about the GPU memory usage of your system? Can the settings provided by the author in the paper be directly translated into usable code? Can you reproduce the author's results?