Open WoutDeRijck opened 2 months ago
We are doing evaluation on this also, we see the output from vllm endpoint indeed is worse until we read https://docs.vllm.ai/en/latest/models/vlm.html and found certain prompt template need to follow from vllm side. Not sure if this maybe one factor you can check.
起始日期 | Start Date
9/3/2024
实现PR | Implementation PR
No response
相关Issues | Reference Issues
No response
摘要 | Summary
When using vLLM to optimally utilize GPU space for faster inference and generation, there is a noticeable degradation in output quality compared to the original model. This issue aims to address the quality drop and find ways to match the original model's performance while maintaining the speed improvements.
基本示例 | Basic Example
Not complete for example
缺陷 | Drawbacks
Current optimization leads to decreased output quality Users may have to choose between speed and quality, which is not ideal Potential increased complexity in configuration to balance speed and quality
未解决问题 | Unresolved questions