I am having the problem with cuda device:
the target llm is loaded with
model = AutoModelForCausalLM.from_pretrained(model_name_or_path, device_map="auto", torch_dtype=torch.bfloat16), thus it is dispatched using accelerate hooks.
But in softopt.py it is running with
output = model(inputs_embeds=before_embeds, use_cache=True), and since the matrix before_embeds is placed on one single gpu, this line of code raises error:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:3!
Do you load the model on one single gpu for the implementation?
I am having the problem with cuda device: the target llm is loaded with
model = AutoModelForCausalLM.from_pretrained(model_name_or_path, device_map="auto", torch_dtype=torch.bfloat16)
, thus it is dispatched using accelerate hooks.But in softopt.py it is running with
output = model(inputs_embeds=before_embeds, use_cache=True)
, and since the matrix before_embeds is placed on one single gpu, this line of code raises error:RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:3!
Do you load the model on one single gpu for the implementation?