GLM-4V-9B Bounding Box - Githubissues

Stanleyluuuu commented 2 months ago

System Info / 系統信息

Hi,

I'm using GLM-4v-9B to develop a feature that allows users to input an image and receive the corresponding bounding box. For example, the prompt might be: "Is there any person fall down? Give me the bounding box in (x1, y1, x2, y2) format if exists."

However, I noticed that the bounding box does not fully enclose the person who has fallen. Could you provide any guidance or instructions regarding the bounding box output?

Who can help? / 谁可以帮助到您？

No response

Information / 问题信息

[ ] The official example scripts / 官方的示例脚本
[X] My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

Input an image and give the prompt "Is there any person fall down? Give me the bounding box in (x1, y1, x2, y2) format if exists."
Plot the output bounding box on the image.

Expected behavior / 期待表现

I expect to understand how to guide the model to output bounding box coordinate in the format I want.

zRzRzRzRzRzRzR commented 2 months ago

This model hasn’t been trained for grounding, so it doesn’t effectively output bounding boxes (bbx) for grounding tasks.

A good suggestion would be to fine-tune the model using a labeled dataset, like the one you mentioned with bbx, to improve its grounding capabilities. However, this process can be complex, particularly in terms of preparing the dataset, which poses a significant challenge.

Stanleyluuuu commented 2 months ago

OK, I understand. Thanks for the clear explanation.

THUDM / GLM-4

GLM-4V-9B Bounding Box #497

System Info / 系統信息

Who can help? / 谁可以帮助到您？

Information / 问题信息

Reproduction / 复现过程

Expected behavior / 期待表现