device problem - Githubissues

I am having the problem with cuda device: the target llm is loaded with model = AutoModelForCausalLM.from_pretrained(model_name_or_path, device_map="auto", torch_dtype=torch.bfloat16), thus it is dispatched using accelerate hooks.
But in softopt.py it is running with output = model(inputs_embeds=before_embeds, use_cache=True), and since the matrix before_embeds is placed on one single gpu, this line of code raises error: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:3! Do you load the model on one single gpu for the implementation?

SchwinnL / circuit-breakers-eval

device problem #2