Yuliang-Liu / Monkey

【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models
MIT License
1.77k stars 122 forks source link

关于分割patch的疑问 #131

Closed tigerzjh closed 5 days ago

tigerzjh commented 2 weeks ago

1、选择最优时这个逻辑是?0.5 image_size image_size ratio[0] ratio[1]代表什么? elif ratio_diff == best_ratio_diff: if area > 0.5 image_size image_size ratio[0] ratio[1]: best_ratio = ratio 2、adaptive 实现上其实就是次优比例啊,没有ataptive啊 image

mxin262 commented 2 weeks ago

感谢您对Mini-Monkey的关注

  1. 选择最优时这个逻辑是和InernvlVL类似的, 让一个图片不要放得太大,超出真实分辨率太多。
  2. adaptive实现上并不是选择次优比例。 adaptive主要体现在,在上一层选择出了一个最佳的宽高比的时候,这里会根据上一层的这个宽高比,自适应的避开和这个宽高比成倍数关系的比例,并选取剩下的比例里面最优的比例,避免一个物体或者文本同时在不同层被切分到。简单理解就是:自适应的选择切分位置确保同一个地方不会同时在不同的层切分到。
mxin262 commented 1 week ago

请问问题解决了吗,还有什么需要帮助的

mxin262 commented 5 days ago

Since there was no response for a long time, I closed it. Please feel free to reopen it if you have any further questions.