Open Chelsea-abab opened 3 weeks ago
hi! sorry for late reply.
dino_final_v2_d7.json is used to store the output results from Grounding Dino.
basically the complete pipeline should be as: we provide Grounding Dino with a prompt, such as an organ (e.g., kidney), along with an image, and then use Grounding Dino to detect the location of this prompt in the image. We record the necessary information for subsequent interaction with GPT in the given notebook.
I have uploaded the example file dino_final_example.json, you can have fun with it. But In theory, the format isn’t standardized, so you can design how to save these results yourself. the key elements are the prompt and the bounding box coordinates output by Grounding Dino.
Thanks so much for your reply and example file! This helps a lot for me to explore the application of your MMedAgent in other modals. By the way, there is another question related to reproducing your work. If I want to conduct the inference by following your inference instructions of the evaluation section, I need to prepare my own "llava_med_agent" model by following your training section and applying lora section, right? Since it seems that you didn't provide your "llava_med_agent" models and the MMedAgent checkpoints "final_model_lora" can't be used directly to generate some results?
Hi! The "final_model_lora" can be directly downloaded but you will need to run this to merge lora with it. The "--model-base ./base_model" is the one that you need to follow our instructions here to download llama7b and llava-med and apply lora. Sorry we couldn't directly provide their original base_model due to their license but you could follow the instructions to obtain the models. :)
Thanks so much for your reply! I totally get how to conduct inference by following your instructions. There is still a little question that how to finetune your well-trained MMedAgent checkpoints on another new modal like fundus images? I can prepare the instruction tuning data by following your provided prompts and .ipynb. To finetune based on the provided checkpoints, should I just follow the train by replacing "--model_name_or_path ./base_model" with "--model_name_or_path ./llava_med_agent"? Really sorry to bother you so many times!!
If you merge lora and save the whole model into './llava_med_agent', answer is Yes.
For the file instruction_generation.ipynb, what's the file "dino_final_v2_d7.json"? And what should it contain? What does the instruction_generation.ipynb do? Sorry for the questions.