Python Binding之后，如何只使用cpu进行推理呢？

QwenLM / qwen.cpp

C++ implementation of Qwen-LM

Other

506 stars 40 forks source link

Python Binding之后，如何只使用cpu进行推理呢？ #77

Closed zzzcccxx closed 4 months ago

zzzcccxx commented 5 months ago

我使用了如下代码

from qwen_cpp import Pipeline                 
pipeline = Pipeline("../qwen.cpp/qwen1-8b-ggml.bin", "../qwen_1_8b/qwen.tiktoken")

result2 = pipeline.chat(["Hello"],stream=True)
for item in result2:
  print(item)

但输出是在所有gpu上一起跑，请问如何只在cpu上跑呢？