feifeibear / LLMSpeculativeSampling

Fast inference from large lauguage models via speculative decoding
Apache License 2.0
530 stars 51 forks source link

关于指定device的问题 #21

Closed pendulum445 closed 1 year ago

pendulum445 commented 1 year ago

https://github.com/feifeibear/LLMSpeculativeSampling/blob/1da363e9d2201663577aa2d90074853e5fda7812/main.py#L82

加载模型这里是否应该加上.to(torch_device) ? 不加的话会报错:RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:3 and cuda:1! (when checking argument for argument index in method wrapper_CUDA__index_select)

taoxunqiang commented 10 months ago

这个问题解决了吗

pendulum445 commented 10 months ago

这个问题解决了吗 把device_map="auto",删掉,再加上.to(device)应该可以解决,这个我很久没看了,记不太清

taoxunqiang commented 10 months ago

嗯,这样处理可以运行,不过好像没法使用多GPU推理了。