X-PLUG / mPLUG-Owl

mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
https://www.modelscope.cn/studios/damo/mPLUG-Owl
MIT License
2.25k stars 171 forks source link

对图像进行坐标检测,生成的bbox是resize成正方形之后的值吗? #199

Open zhaop-l opened 8 months ago

zhaop-l commented 8 months ago

对图像进行坐标检测,生成的bbox是resize成正方形之后的值吗?

因为实例代码是将图像resize成正方形了,我跑出来结果有点偏差,所以想问下你们是正方形吗?

fansticOne commented 8 months ago

请问如何设计prompt来对图像进行坐标检测,输出bbox?感谢

zhaop-l commented 8 months ago

请问如何设计prompt来对图像进行坐标检测,输出bbox?谢谢

Please provide the bounding box coordinate of the region this sentence describes : people on car.

fansticOne commented 8 months ago

请问如何设计prompt来对图像进行坐标检测,输出bbox?谢谢

Please provide the bounding box coordinate of the region this sentence describes : people on car. 试了没有给出bbox呀,回答的是: The bounding box coordinate of the region this sentence describes is the sheep lying on the ground.

LukeForeverYoung commented 8 months ago

Yes, the images are resized to squares for example 448x448. But, the generated coordinates should be a value in the range [0,1], which are ratios that are unrelated to the actual resolution. Therefore, whatever you resize the image, the coordinates are always applicable. You may need to check if the image has been cropped, as this could cause the coordinates to be offset relative to the original image.

fansticOne commented 7 months ago

请问一下,对图像进行坐标检测,使用的是哪个权重文件?