Closed babytdream closed 4 months ago
What command did you run? Did you make any modifications to the scripts?
Do you have 16 GPUs in one machine or 2 machines with 8 GPUs each?
@carmocca I have 16 GPUs in one machine, here is gpu :
The commanad is :
First, I have downloaded llama2-70b-chat and converted it to huggingface format using this script. Then I used below command, as mentioned in download_llama_2.md
python scripts/convert_hf_checkpoint.py --checkpoint_dir checkpoints/meta-llama/Llama-2-70b-chat-hf
Then I use this command:
python chat/base.py --checkpoint_dir checkpoints/meta-llama/Llama-2-70b-chat-hf
The error is: log1.txt
I think the model file lit_model.pth
should be split. And it should insert torch.nn.DataParallel to support multi-gpus.Do you have some ideas, thanks!
Oh yes, the chat script doesnt support multi-gpu at the moment: https://github.com/Lightning-AI/lit-gpt/blob/e83c068afc13dd84fd628a8da235cfcfa49a1193/chat/base.py#L149
However, you can use the generate/base.py
script which does support it: https://github.com/Lightning-AI/lit-gpt/blob/main/tutorials/inference.md#run-a-large-model-on-multiple-smaller-devices
@carmocca
Hi, I follow your command:
python generate/base.py --checkpoint_dir /data/model/Llama-2-70b-chat-hf/ --strategy fsdp --devices 16
But it appears an error: log1.txt Is there a bug?
Can you also do https://github.com/Lightning-AI/lit-gpt/issues/432#issuecomment-1682259981? This is a known issue from a recent Fabric update
@carmocca
Hi, I follow your command:
python generate/base.py --checkpoint_dir /data/model/Llama-2-70b-chat-hf/ --strategy fsdp --devices 12
But it runs more than 1 hour, this doesn't seem normal.There are logs:
Loading model '/data/model/Llama-2-70b-chat-hf/lit_model.pth' with {'org': 'meta-llama', 'name': 'Llama-2-70b-chat-hf', 'block_size': 4096, 'vocab_size': 32000, 'padding_multiple': 64, 'padded_vocab_size': 32000, 'n_layer': 80, 'n_head': 64, 'n_embd': 8192, 'rotary_percentage': 1.0, 'parallel_residual': False, 'bias': False, 'n_query_groups': 8, 'shared_attention_norm': False, '_norm_class': 'RMSNorm', 'norm_eps': 1e-05, '_mlp_class': 'LLaMAMLP', 'intermediate_size': 28672, 'condense_ratio': 1}
Time to instantiate model: 0.05 seconds.
Time to load the model weights: 546.27 seconds.
[rank: 11] Global seed set to 1234
[rank: 1] Global seed set to 1234
[rank: 0] Global seed set to 1234
[rank: 5] Global seed set to 1234
[rank: 3] Global seed set to 123
In my opinion, also the "546.27 seconds" loading time seems super long. I think it should not be longer than a minute usually. Do you have the weights on an S3 bucket or sth like that by chance?
Oh sorry, I just compared it to the 7B model which took ~1 min to load. I observed it taking longer (10 min) when I had the .pth file on an S3 bucket once.
I don't know the 70B loading times off the top of my head, sorry!
Same question like #456
Multi-GPU inference is now supported: https://github.com/Lightning-AI/lit-gpt/blob/main/tutorials/inference.md#run-a-large-model-on-multiple-smaller-devices
This erro appears in other project when I use 16 A10(16 23G) to inference Llama2-70B:
I ask many people to solve this problem,but failed. I know 8 gpu can work it! But I need to increase the prompt of llama2, the 8 GPU is not enough! Do you have some ideas?In this project,can you solve it?Thanks!