How to evaluate LLaVA-OneVision finetuned with custom dataset?

Bleking commented 1 week ago

I would like to ask you if I have to create a new python file for my finetuned model in the 'lmms_eval/models' directory and make a class for the model in the code, or if I just need to use the python file prepared for OneVision model in the same directory.

I recently finetuned LLaVA OneVision(using Qwen/Qwen2-0.5B-Instruct) with my custom dataset(only of single image) and need to evaluate the model using llms-eval.

I guess I will have to follow the Add customized model and dataset and follow the Model guide but I am not sure if I am doing it correctly.

So far, I have forked this repository and git cloned it to my linux sever environment. I also git checked out with git checkout -b Qwen/Qwen2-0.5B-Instruct and created a different virtual environment for llms-eval than the environment from llava.

To sum up, I would like to ask you what I have to do in order to evaluate my model finetuned with my custom dataset.

Create a new python file for my finetuned model
Just use the llava_onevision.py file

kcz358 commented 1 week ago

If your inference logic is the same with llava_onevision, you can just reuse the llava_onevision.py and add pretrained=<your_local_path> in the model_args

Bleking commented 1 week ago

Thank you for the information!

The directory named as "OneVision-siglip-Qwen2-0.5B" is the result of finetuning. In order to use it, I think I will have to go like this: python3 -m accelerate.commands.launch --num_processes=8 -m lmms_eval --model llava_onevision --model_args pretrained="/home/work/testdataset1/LLaVA-NeXT/checkpoints/OneVision-results/OneVision-siglip-Qwen2-0.5B"

Do I have to process something before that? For example, in LLaVA-v1.5, we need to run merge_lora_weights in order to create the model files. I am also wondering if the same is needed for LLaVA-OneVision as well, not to mention LLaVA-NeXT.

I am confused since the things are a lot different from the previous LLaVA versions.

Bleking commented 1 week ago

And another question. Since I have made my own test(evaluation) dataset for LLaVA-v1.5, I think I might have to continue using it for LLaVA-NeXT and LLaVA-OneVision. Do I have to utilise the 'tasks' argument and define my own one? If my assumption is correct, how can we add our own test data? I am trying to learn about llms-eval but I still need some help understanding how things work with it.

Thank you.

kcz358 commented 1 week ago

For merging lora, I believe the preprocessing process are the same. You have to have a final safetensor file so that it can be able to load. But I haven't tried using Lora models on our pipeline so feel free to correct me if I am wrong.

If you want to add in your own task, you can refer for the new task guide in docs to add in your tasks. You can refer to some examples such as mme, ai2d and llava in the wild to add your dataset

Bleking commented 1 week ago

For merging lora, I believe the preprocessing process are the same. You have to have a final safetensor file so that it can be able to load. But I haven't tried using Lora models on our pipeline so feel free to correct me if I am wrong.

If you want to add in your own task, you can refer for the new task guide in docs to add in your tasks. You can refer to some examples such as mme, ai2d and llava in the wild to add your dataset

This is my result of merge_lora_weights saved in the archive of LLaVA-NeXT.

I underlined the safetensors file in the image. Is this what you were trying to say? Have you done the same process as I did as well?

Bleking commented 1 week ago

Well, I am reading this page at the moment. https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/new_task_guide.md

I can either upload my custom test dataset on Huggingface or just follow the instruction for using the local dataset. Whichever I choose, does this require the JSON file that contains conversations about the image data? If this is true, then it is a lot different from LLaVA v1.5 that requires questions, captions, reference answers, and generated answers.

kcz358 commented 1 week ago

You have to make the dataset in a load_dataset compatible way. You can check the tools for more info on making datasets

Bleking commented 1 week ago

https://github.com/EvolvingLMMs-Lab/lmms-eval/blob/main/tools/make_image_hf_dataset.ipynb

Is this the tool you mentioned?

EvolvingLMMs-Lab / lmms-eval

How to evaluate LLaVA-OneVision finetuned with custom dataset? #241