SJTU-IPADS / PowerInfer

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
MIT License
7.96k stars 412 forks source link

是否有计划兼容openai API风格的计划? #155

Open shmily91 opened 8 months ago

shmily91 commented 8 months ago

1、是否有计划兼容openai API风格的计划? 2、能否并发处理或者是否有兼容并发处理的计划? 3、训练适配的mistral-7b模型需要多长时间?个人训练适配的模型难度和成本大吗? 4、是否有扩大该推理框架生态的计划?

hodlen commented 7 months ago

回复较晚,见谅。

  1. 目前 examples/server 应当可以提供一个OpenAI API的兼容层,如果你遇到特定的问题,欢迎提供反馈。
  2. 如果你指的是prompt相同的并行推理,可以使用 examples/batched 。对于不同的prompt, examples/server 的 --cont-batching 可能有所帮助,但并不建议使用,因为我们在测试中发现它会生成明显错误的结果。在算子上,我们近期对batch size大于1的情况进行了一定优化,并且在batch size极高的情况下回退到稠密计算。我们预期在各种情况下速度不会慢于稠密模型。
  3. 我们已经发布了适配的Mistral-7B模型,命名为Bamboo。为了恢复模型性能,我们在约200B token上进行了恢复训练和进一步预训练,这是个人无法承受的。
  4. 是的,我们正在推进做模型适配和平台适配两个方向的工作,具体请见我们的看板。我们将在模型的可用性和推理框架的支持度更加成熟时,以PowerInfer为基点更积极地建立或参与到上下游生态中。

Sorry for the late reply.

  1. Currently, the examples/server should be able to provide an OpenAI API compatibility layer. If you encounter specific issues, we welcome your feedback.
  2. For parallel inference with the same prompt, you can use examples/batched. For different prompts, the --cont-batching option in examples/server might be of help, but it is not recommended due to significant errors observed in testing. We have recently optimized operations for batch sizes greater than 1, reverting to dense computation for very high batch sizes. We expect the speed to be at least as fast as dense models under various conditions.
  3. We have released the adapted Mistral-7B model named Bamboo. To restore model performance, we conducted retraining and further pretraining on approximately 200B tokens, which would be unfeasible for individuals.
  4. Yes, we are working on adapting both models and platforms. For more details, please see our Kanban. As the model availability and support level of the inference framework mature, we will actively establish or participate in the upstream and downstream ecosystem, using PowerInfer as a starting point.