NJU-LHRS / LHRS-Bot

VGI-Enhanced multimodal large language model for remote sensing images.
Apache License 2.0
112 stars 9 forks source link

The visual grounding result seems didn't work out so well. #25

Open MingkunLishigure opened 2 months ago

MingkunLishigure commented 2 months ago

Hello, Thank you for your outstanding work! I am trying to test the LHRS bot on some other datasets, such as HRSC-2016, which is a remote sensing dataset of different types of ships. I use the FINAL.pt in the development checkpoint. And the inference result are shown in below image: test1 test2 test3 test4

From the results, the model seems to be able to describe the image to some extent better, but the ability for visual localisation tasks is not satisfactory enough, is it because the dataset I am using is too cold, or because there is a problem with the prompt used, or there is some error in the visualisation process?

I also tested on classic images and found this similar problem, can you please tell me that you can reason better in the same type of images based on the current model?

Image: demo

[VG] Bus:

test5

Thank you very much!

pUmpKin-Co commented 2 months ago

Hi!

Thank your for pointing out.

I believe the root problem is related to #27.

And we will continuously improve our model for improving the VG ability.

Thanks!

MingkunLishigure commented 2 months ago

Thank you for your answer!Hope everything goes well!