Alpha-VLLM / LLaMA2-Accessory

An Open-source Toolkit for LLM Development
https://llama2-accessory.readthedocs.io/
Other
2.68k stars 170 forks source link

multi_turn_mm_box not working for Sphinx #92

Open saffie91 opened 10 months ago

saffie91 commented 10 months ago

Hello,

First of all gotta say love this model and how well it performs on your official demo. Fantastic job!

I am trying to run it on my machine and while the multi_turn_mm is working I cant get the box one to work. It is asking for llama_config which then I put in the 2 you have on finetune directory: configs/model/finetune/sg/llamaPeft_normBiasLora.json configs/model/finetune/mm/llamaPeft_normBiasLora.json

However that doesn't seem to be acceptable input as I get this error:

TypeError: init() got an unexpected keyword argument 'lora_rank'

Is there another config json file for Sphinx box?

I tried to just go ahead without the llama config but then I'm not getting any bounding boxes at all.

Would appreciate your help figuring this one out.

Thanks in advance.

ChrisLiu6 commented 10 months ago

Hi! Thank you for your interest in our work.

I tried to just go ahead without the llama config but then I'm not getting any bounding boxes at all.

That is the right way to launch the multi_turn_mm_box demo. The reason why your model does not output box is that the checkpoint released on huggingface is a much earlier version than that for our official demo. We are updating the latest checkpoint right now and with notify you when it is finished.

saffie91 commented 10 months ago

Thank you for quick response! I'm waiting for the new model upload then :)

gaopengpjlab commented 10 months ago

https://huggingface.co/Alpha-VLLM/LLaMA2-Accessory/tree/main/finetune/mm/SPHINX

ChrisLiu6 commented 10 months ago

Thank you for quick response! I'm waiting for the new model upload then :)

Hi, the checkpoints have been uploaded and the document has also been updated

saffie91 commented 10 months ago

Thank you for quick response! I'm waiting for the new model upload then :)

Hi, the checkpoints have been uploaded and the document has also been updated

download (1)

Hey Thanks for uploading the model and answering so quickly. I have made it run however my demo's bounding boxes are way off. I am assuming it has something to do with the size of the image (aspect ratio/crop something like this). Do you guys have any idea how I might fix this?

ChrisLiu6 commented 10 months ago

Hi, could you please provide the command you use for demo launching? We've experienced similar problems and finally found that we were using a SPHINX (llama_ens) model but loading Long-SPHINX (llama_ens5) checkpoints.

saffie91 commented 10 months ago

python3.9 demos/multi_turn_mm_box.py --n_gpus=2 --tokenizer_path=../../LLaMA2-Accessory_old/tokenizer/tokenizer.model --llama_type=llama_ens5 --pretrained_path ../../LLaMA2-Accessory_old/updated_model_ckpt_path/

This is the command, I disabled SAM as I can't fit everything into 2 gpu's then. updated_model_ckpt has consolidated.00-of-02.model and consolidated.01-of-02.model

https://huggingface.co/Alpha-VLLM/LLaMA2-Accessory/tree/main/finetune/mm/SPHINX/Long-SPHINX

downloaded from here. Although its a bit confusing to have both models file names same I gotta admit.

ChrisLiu6 commented 10 months ago

python3.9 demos/multi_turn_mm_box.py --n_gpus=2 --tokenizer_path=../../LLaMA2-Accessory_old/tokenizer/tokenizer.model --llama_type=llama_ens5 --pretrained_path ../../LLaMA2-Accessory_old/updated_model_ckpt_path/

This is the command, I disabled SAM as I can't fit everything into 2 gpu's then. updated_model_ckpt has consolidated.00-of-02.model and consolidated.01-of-02.model

https://huggingface.co/Alpha-VLLM/LLaMA2-Accessory/tree/main/finetune/mm/SPHINX/Long-SPHINX

downloaded from here. Although its a bit confusing to have both models file names same I gotta admit.

That's really weird. I just checked the checkpoints and they worked fine. Would you mind checking the md5sum of the checkpoints? They should be:

2d6df8d325e5730de30c0a6a70c2bdcc consolidated.00-of-02.model.pth 26639e5beb5c18ec057cd66f8ed2e216 consolidated.01-of-02.model.pth

Please also make sure that you've pulled the latest version of LLaMA2-Accessory.

Finally, according to our data processing pipeline, images are first padded into squares, and (Long-)SPHINX is trained to predict the normalized coordinates w.r.t. the padded image. You may check if the coordinates output by SPHINX per se is wrong, or the coordinates are right but the boxes are wrongly displayed.

gaopengpjlab commented 10 months ago

@saffie91 What's your prompt for object detection?

s-mahajan commented 4 months ago

Can you share the exact manner in which padding and bounding boxes are "post-processed". In the main paper too, the bounding box around the car does not make sense, it is definitely not 0.37 and 0.90: Thanks in advance! image

ChrisLiu6 commented 4 months ago

Can you share the exact manner in which padding and bounding boxes are "post-processed". In the main paper too, the bounding box around the car does not make sense, it is definitely not 0.37 and 0.90: Thanks in advance! image

  1. first pad the image to a square, with max(width, height) as the edge length. The original image is arranged at the middle of the padded image
  2. the coordinates are with respect to the padded square image
s-mahajan commented 4 months ago

Thanks a lot!