Alpha-VLLM / LLaMA2-Accessory

An Open-source Toolkit for LLM Development
https://llama2-accessory.readthedocs.io/
Other
2.68k stars 170 forks source link

LLaMA2-Adaptor x Region Demo setting #63

Closed erjui closed 8 months ago

erjui commented 1 year ago

Hi, first of all, thanks for the nice work.

The demo of the multi-modal llama2-adaptor and region is really nice that it even finds bounding boxes of the given image. However, when I tried the same thing on the llama2-adaptor demo page, it seemed the model did not give a proper answer. (http://llama-adapter.opengvlab.com/)

So I wonder if there is a difference when training/fine-tuning the llama2-adaptor model on multi-modal visual instructions.

Thanks in advance :)

image (1)

Artanic30 commented 1 year ago

Hello, thank you for your interest in our work. At the moment, the demo model does not incorporate bounding boxes, which means it cannot generate location information as requested in the prompt. However, we are actively working on the llama2-adaptor with localization capabilities, and we will eagerly share the intriguing demo with you as soon as our research is complete.

erjui commented 1 year ago

Thanks a lot for the answer! However then, I'm very curious about how you incorporate bounding box localization from the prompt. Could you tell me how you integrated such localization ability by any chance??

gaopengpjlab commented 1 year ago

We will release the training pipeline for incorporating box into LLM as soon as possible. Stay tuned.

erjui commented 1 year ago

Thanks a lot for the answer! 👍

gaopengpjlab commented 9 months ago

Please try SPHINX which shows strong performance on object spotting.