Closed CoeusMaze closed 4 months ago
This is right, because vLLM version management is not done very well, so it leads to a lot of conflicts, I suggest
pip install vllm
pip uninstall flash_attn xgboost transformer_engine -y
related issue: https://github.com/vllm-project/vllm/pull/2804
The current batch_inference.oy requires vllm package, while vllm package is not in the requirements.txt and it possibly conflicts with flash attn package. Is there a way around this other than commenting out vllm every time we want to do batch inference with normal LLM?