Open linyubupa opened 1 year ago
if you got multi gpu, the cost cpu memory = 2 model_size gpu_numbers
Hey @linyubupa
This is expected in the way you are initializing the model. I can see from the code snippet that you create the model in __init__
. This isn't wrong, but for large models like yours it is inefficient. I recommend moving the initialization into this special Lightning hook:
def configure_sharded_model(self):
self.model = AutoModelForCausalLM.from_pretrained(...)
Here is the documentation for working with deepspeed models (and also documentation for configure_sharded_model): https://pytorch-lightning.readthedocs.io/en/stable/advanced/model_parallel.html#shard-model-instantly-to-reduce-initialization-time-memory
Please let me know if that helps :)
Please let me know if that helps :)
I had the same problem, but this method didn't solve it
Hey @linyubupa
This is expected in the way you are initializing the model. I can see from the code snippet that you create the model in
__init__
. This isn't wrong, but for large models like yours it is inefficient. I recommend moving the initialization into this special Lightning hook:def configure_sharded_model(self): self.model = AutoModelForCausalLM.from_pretrained(...)
Here is the documentation for working with deepspeed models (and also documentation for configure_sharded_model): https://pytorch-lightning.readthedocs.io/en/stable/advanced/model_parallel.html#shard-model-instantly-to-reduce-initialization-time-memory
I had the same problem, but this method didn't solve it
Hey @linyubupa
This is expected in the way you are initializing the model. I can see from the code snippet that you create the model in
__init__
. This isn't wrong, but for large models like yours it is inefficient. I recommend moving the initialization into this special Lightning hook:def configure_sharded_model(self): self.model = AutoModelForCausalLM.from_pretrained(...)
Here is the documentation for working with deepspeed models (and also documentation for configure_sharded_model): https://pytorch-lightning.readthedocs.io/en/stable/advanced/model_parallel.html#shard-model-instantly-to-reduce-initialization-time-memory
Yeah had same issue, and above does not solve it
def configure_sharded_model(self):
sorry for late reply,I build up model in configure_sharded_model , but the cpu memory still cost amountly
Same issue. After I put the model initialization into the configure_sharded_model,
I return a new error that shows the loaded parameters are trying to assign to empty tensors.
Same issue. After I put the model initialization into the
configure_sharded_model,
I return a new error that shows the loaded parameters are trying to assign to empty tensors.
It seems the model initialization should be here, but loading pre-trained weights should not be put here.
Same issue. After I put the model initialization into the
configure_sharded_model,
I return a new error that shows the loaded parameters are trying to assign to empty tensors.It seems the model initialization should be here, but loading pre-trained weights should not be put here.
Did you find a solution for this?
any solution about this? i really need help.
I am facing a similar issue
I think one possible solution is to convert the pretrained model weights to the deepspeed zero3 shared model formate, but I haven't tried it yet.
Is there code to try it out?
any solution about this? i really need help.
@saketsathe @KzZheng
i tried like this.
`def configure_sharded_model(self): print("start configure sharded model ")
llamaconfig = LlamaConfig.from_pretrained("decapoda-research/llama-7b-hf")
self.model = LlamaForCausalLM(llamaconfig)
self.model.set_adapter(self.adapter_config)
freeze_except_adapter(self.model, self.adapter_config)
# 단일 weight의 list
params_to_gather = [self.model.model.layers[0].self_attn.q_proj.weight]
# 각 프로세스마다 실행됨.
# checkpoint shard 에 있는 namedparameter 찾아서, 내 모델에서 동일한 named parameter 있으면 변경.
# 일단 한번 출력. 이 weight가 값이 몇인지. 나중에 같은 코드로 0으로 변경된거 보기.
# 변경 전 값 확인
# check value before change
with deepspeed.zero.GatheredParameters(params_to_gather, modifier_rank=0):
print("\n random 초기화 된 weight \n", self.model.model.layers[0].self_attn.q_proj.weight[0, : 5])
# 없음.
time.sleep(3)
# 1. 파일 하나씩 부르기
# 2. 모델 파라미터랑 파일이랑 key 매칭.
# 3. 값 넣기
# GPU 한개 에서만 돌림.
if torch.distributed.get_rank() == 0:
with deepspeed.zero.GatheredParameters(params_to_gather, modifier_rank=0):
self.model.model.layers[0].self_attn.q_proj.weight[0, : 5] = 0
# 없음.
# # 체크포인트 파일 확인
# SHARDED_FILE_PATH = "/home2/leeg/.cache/huggingface/hub/models--decapoda-research--llama-7b-hf/snapshots/5f98eefcc80e437ef68d457ad7bf167c2c6a1348"
# # 내 모델 올린거에서 state dict
# #model_named_params = self.model.model.named_parameters()
# # self.model.get_parameter()
# # 체크포인트 파일 불러오기
# PATH_LIST = [os.path.join(SHARDED_FILE_PATH, f"pytorch_model-000{i:02}-of-00033.bin") for i in range(1, 34)]
# for PATH in tqdm(PATH_LIST):
# # single checkpoint shard file 's state dict load
# file_state_dict = torch.load(PATH)
# # 내 모델에 있는 named_parameters 에서
# named_parameters = dict(self.model.model.named_parameters())
# # 불러온 checkpoint file에 있는 key를 가지고 비교. key가 내 모델에도 있으면, 파일에서 value 가져옴
# params_to_gather = [named_parameters[k] for k in file_state_dict.keys() if k in named_parameters]
# # for cp_k, cp_v in file_state_dict.items():
# # if "inv_freq" in cp_k:
# # continue
# # model_p = self.model.model.get_parameter(cp_k)
# # sharded_model_ps_dict[cp_k] = self.model.model.get_parameter(cp_k)
# with deepspeed.zero.GatheredParameters(params=params_to_gather, modifier_rank=0):
# self.model.model.load_state_dict(file_state_dict, strict=False)
dist.barrier()
# 각 프로세스마다 실행됨.
# 변경 후 값 확인
# check value after change
with deepspeed.zero.GatheredParameters(params_to_gather, modifier_rank=0):
print("\n barrier weight ", self.model.model.layers[0].self_attn.q_proj.weight[0, : 5])
# 없음.`
The problem is, when i use from_pretrained("~~") in LightningModule's configure_sharded_model, Lightning Strategy Deepspeed 3 disturbs from_pretrained's assign work.
so, i tried using manual assignment rather than using from_pretrained, from sharded checkpoint file's parameter tensor to my model variable.
i didn't firmly figured out all of this, but i experimented below things.
so, i think calling pretrained parameter file manually, and change my random initialized model parameters in deepspeed.zero.GatheredParameters is suitable approach.
I solve this by using deepspeed init with transformers trainer : https://huggingface.co/docs/transformers/main_classes/deepspeed
、、、
deepspeed --num_gpus 8 --num_nodes 2 --hostfile hostfile --master_addr hostname1 --master_port=9901 \
your_program.py
Is there any update on this issue?
update?
same...
any updates?
Bug description
when using huggingface pretrained model with multi-gpu, model parameters were duplicate for every gpu in ram
How to reproduce the bug
Error messages and logs
Environment
Current environment
``` #- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow): #- PyTorch Lightning Version (e.g., 1.5.0): #- Lightning App Version (e.g., 0.5.2): #- PyTorch Version (e.g., 2.0): #- Python version (e.g., 3.9): #- OS (e.g., Linux): #- CUDA/cuDNN version: #- GPU models and configuration: #- How you installed Lightning(`conda`, `pip`, source): #- Running environment of LightningApp (e.g. local, cloud): ```More info
No response
cc @awaelchli