An error occurred during the first execution:
RuntimeError: Expected one of cpu, cuda, ipu, xpu, mkldnn, opengl, opencl, ideep, hip, ve, fpga, ort, xla, lazy, vulkan, mps, meta, hpu, mtia, privateuseone device type at start of device string: gpu
Then I modified HFDecoderModel in hf_decoder_model.py to use cuda, and the following error occurred:
NotImplementedError: device "cuda" is not supported
On the third attempt, I changed it to use cpu and got the error:
ValueError: The following model_kwargs are not used by the model: ['use_accelerator']"
Is there any configuration or environment setting error on my part?
Hi, I attempted to use speculative decoding but encountered some errors. May I ask for your assistance?
I used the parameters from the first example.
python ./examples/speculative_inference.py \ --model gpt2-xl --draft_model gpt2 --temperature 0.3 --gamma 5 --max_new_tokens 512 --gpu 0
An error occurred during the first execution: RuntimeError: Expected one of cpu, cuda, ipu, xpu, mkldnn, opengl, opencl, ideep, hip, ve, fpga, ort, xla, lazy, vulkan, mps, meta, hpu, mtia, privateuseone device type at start of device string: gpu
Then I modified HFDecoderModel in hf_decoder_model.py to use cuda, and the following error occurred: NotImplementedError: device "cuda" is not supported
On the third attempt, I changed it to use cpu and got the error: ValueError: The following model_kwargs are not used by the model: ['use_accelerator']"
Is there any configuration or environment setting error on my part?