Closed pendulum445 closed 1 year ago
https://github.com/feifeibear/LLMSpeculativeSampling/blob/1da363e9d2201663577aa2d90074853e5fda7812/main.py#L82
加载模型这里是否应该加上.to(torch_device) ? 不加的话会报错:RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:3 and cuda:1! (when checking argument for argument index in method wrapper_CUDA__index_select)
.to(torch_device)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:3 and cuda:1! (when checking argument for argument index in method wrapper_CUDA__index_select)
这个问题解决了吗
这个问题解决了吗 把device_map="auto",删掉,再加上.to(device)应该可以解决,这个我很久没看了,记不太清
device_map="auto",
.to(device)
嗯,这样处理可以运行,不过好像没法使用多GPU推理了。
https://github.com/feifeibear/LLMSpeculativeSampling/blob/1da363e9d2201663577aa2d90074853e5fda7812/main.py#L82
加载模型这里是否应该加上
.to(torch_device)
? 不加的话会报错:RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:3 and cuda:1! (when checking argument for argument index in method wrapper_CUDA__index_select)