When I use the demo provided by read.md to run to output_ids = model.generate(**inputs, max_new_tokens=128), an error RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument mask in method wrapper_CUDA__maskedscatter) appears. My devlepment environment is as follows:
cuda:11.8
pytorch:2.1.0&2.4.1
transformers:4.45.0.dev0
accelerate:0.34.2
hardware environment: H800 80G
When I use the demo provided by read.md to run to output_ids = model.generate(**inputs, max_new_tokens=128), an error RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument mask in method wrapper_CUDA__maskedscatter) appears. My devlepment environment is as follows: cuda:11.8 pytorch:2.1.0&2.4.1 transformers:4.45.0.dev0 accelerate:0.34.2 hardware environment: H800 80G