Wangyixinxin / MMedAgent

Learning to Use Medical Tools with Multi-modal Agent
68 stars 7 forks source link

Query about the instruction_generation.ipynb #2

Open Chelsea-abab opened 3 weeks ago

Chelsea-abab commented 3 weeks ago

For the file instruction_generation.ipynb, what's the file "dino_final_v2_d7.json"? And what should it contain? What does the instruction_generation.ipynb do? Sorry for the questions.

AndyCA111 commented 2 weeks ago

hi! sorry for late reply.

dino_final_v2_d7.json is used to store the output results from Grounding Dino.

basically the complete pipeline should be as: we provide Grounding Dino with a prompt, such as an organ (e.g., kidney), along with an image, and then use Grounding Dino to detect the location of this prompt in the image. We record the necessary information for subsequent interaction with GPT in the given notebook.

I have uploaded the example file dino_final_example.json, you can have fun with it. But In theory, the format isn’t standardized, so you can design how to save these results yourself. the key elements are the prompt and the bounding box coordinates output by Grounding Dino.

Chelsea-abab commented 2 weeks ago

Thanks so much for your reply and example file! This helps a lot for me to explore the application of your MMedAgent in other modals. By the way, there is another question related to reproducing your work. If I want to conduct the inference by following your inference instructions of the evaluation section, I need to prepare my own "llava_med_agent" model by following your training section and applying lora section, right? Since it seems that you didn't provide your "llava_med_agent" models and the MMedAgent checkpoints "final_model_lora" can't be used directly to generate some results?

Wangyixinxin commented 2 weeks ago

Hi! The "final_model_lora" can be directly downloaded but you will need to run this to merge lora with it. The "--model-base ./base_model" is the one that you need to follow our instructions here to download llama7b and llava-med and apply lora. Sorry we couldn't directly provide their original base_model due to their license but you could follow the instructions to obtain the models. :)

Chelsea-abab commented 2 weeks ago

Thanks so much for your reply! I totally get how to conduct inference by following your instructions. There is still a little question that how to finetune your well-trained MMedAgent checkpoints on another new modal like fundus images? I can prepare the instruction tuning data by following your provided prompts and .ipynb. To finetune based on the provided checkpoints, should I just follow the train by replacing "--model_name_or_path ./base_model" with "--model_name_or_path ./llava_med_agent"? Really sorry to bother you so many times!!

AndyCA111 commented 1 week ago

If you merge lora and save the whole model into './llava_med_agent', answer is Yes.