OpenBMB / MiniCPM-V

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
Apache License 2.0
12.52k stars 880 forks source link

[vllm] - #555

Open WoutDeRijck opened 2 months ago

WoutDeRijck commented 2 months ago

起始日期 | Start Date

9/3/2024

实现PR | Implementation PR

No response

相关Issues | Reference Issues

No response

摘要 | Summary

When using vLLM to optimally utilize GPU space for faster inference and generation, there is a noticeable degradation in output quality compared to the original model. This issue aims to address the quality drop and find ways to match the original model's performance while maintaining the speed improvements.

基本示例 | Basic Example

Not complete for example Screenshot 2024-09-03 at 16 39 52

缺陷 | Drawbacks

Current optimization leads to decreased output quality Users may have to choose between speed and quality, which is not ideal Potential increased complexity in configuration to balance speed and quality

未解决问题 | Unresolved questions

  1. What specific aspects of the optimization are causing the quality degradation?
  2. Are there any configuration parameters that can be tuned to improve quality without sacrificing speed?
  3. Is it possible to implement a dynamic system that adjusts optimization based on the specific task or required quality level?
  4. How can we quantify and measure the quality degradation to better address the issue?
  5. Are there any alternative optimization techniques that could provide better quality-speed balance?
kennethzhu88 commented 2 months ago

We are doing evaluation on this also, we see the output from vllm endpoint indeed is worse until we read https://docs.vllm.ai/en/latest/models/vlm.html and found certain prompt template need to follow from vllm side. Not sure if this maybe one factor you can check.