FuxiaoLiu / LRV-Instruction

[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
https://fuxiaoliu.github.io/LRV/
BSD 3-Clause "New" or "Revised" License
254 stars 13 forks source link

Question about the coordinate in the instruction #14

Closed Richar-Du closed 1 year ago

Richar-Du commented 1 year ago

Thanks for your awesome work! I notice that there are some data containing coordinate of the bounding box, for example:

question: Are the white feathers of a bird at X:437 Y:0 with Width:60 Height:60?
answer: These dimensions and coordinates refer to the green foliage in the background, not white feathers of a bird.
task: negative

However, all the images have been resized before training. For example, Minigpt-4 has resized the images to 224*224, which is smaller than the range of the coordinate. I wonder how to deal with such cases. Thanks in advance :)

FuxiaoLiu commented 1 year ago

Thanks for your comments! In order to ensure the quality of the dataset. We remove the instructions with the coordinates for now. You can download the updated version from: gdown https://drive.google.com/uc?id=1pWkxE2kqpys1VdwBi99ZXN6-XY5SqhwU or gdown https://drive.google.com/uc?id=1NTxkuRPlvDn7aWaJpK_yb0p5r0cxPLNZ

As for the future plan, we will also plan resize the the coordinates in the instructions and answers based on the new image size 224*224 and the original image size. We will release them in the next week.

Richar-Du commented 1 year ago

Great! Thanks for your effort and we're looking forward to your update :)