corentin-ryr / MultiMedEval

A Python tool to evaluate the performance of VLM on the medical domain.
MIT License
34 stars 3 forks source link

Questions about the format of input and output data #12

Closed Yanllan closed 4 months ago

Yanllan commented 5 months ago

Sorry to bother you again! But I have some questions about the format of the input and output data as follows: image Is this the image path? Read it as PIL.image(path)? and image What is the json format of answers? Can you give me a specific example? I am very sorry to bother you all the time, and I would really appreciate your reply.

corentin-ryr commented 5 months ago

Hello,

The list of images is a list of Pillow images object (not paths).

The answer from the batcher should be a list of strings (one string per sample in the input). Here is an example batcher for the Mistral 7B model implemented using vLLM:

from transformers import AutoTokenizer, AutoModelForCausalLM
from vllm import LLM, SamplingParams

class batcherMistral:
    def __init__(self) -> None:
        self.tokenizer = AutoTokenizer.from_pretrained(modelName)
        self.tokenizer.pad_token = self.tokenizer.eos_token
        self.LLM = LLM("mistralai/Mistral-7B-Instruct-v0.2")

    def __call__(self, prompts):
        model_inputs = [self.tokenizer.apply_chat_template(messages[0], tokenize=False) for messages in prompts]

        outputs = self.LLM.generate(model_inputs, SamplingParams(temperature=0.05, max_tokens=400))
        decoded = [output.outputs[0].text for output in outputs]

        return decoded

It is implemented as a Python Callable class. It takes the HF conversation as input, formats them with the chat template of the Mistral model, and returns the text generated by the model.

I hope this helps and I will clarify the readme in the future.