[Question] Slow Speed of vLLM when evaluating MMLU

PKU-Alignment / align-anything

Align Anything: Training All-modality Model with Feedback

Apache License 2.0

260 stars 47 forks source link

Thank you for your question! This is a known issue. Since the current architecture implements the BaseInference class based on deepspeed and vllm in the same Python file, importing deepspeed-related dependencies causes vllm to fail to start properly. Therefore, I set distributed_executor_backend="ray" when starting vllm. This does significantly affect efficiency. We will further modify the framework in the next version to completely decouple the two backends and fully unleash the inference speed of vllm.

PKU-Alignment / align-anything

[Question] Slow Speed of vLLM when evaluating MMLU #35

Required prerequisites

Questions