Closed laserwave closed 2 months ago
hi, this is an interesting question, our 4khd model indeed supports grounding (btw, the xcompoer2 also has strong grounding capability), we provide the image width and height to the model, and predict the pixel directly, here are a few examples. 4khd works well on Images with different sizes.
thank you
@LightDXY hi,does it support REG,i.e. image caption or vqa task corresponding to a given image region/bounding box
hi , REG is also supported, as it is a symmetrical task of grounding. For the vqa task corresponding to a given image region/bounding box, we do not use such data in the training.
xcomposer2-4khd是否支持REC(reference expression comprehension)和REG(reference expression generation)任务呢,动态分辨率是否难以学习这两个任务