Question about running inference

X-PLUG / mPLUG-Owl

mPLUG-Owl: The Powerful Multi-modal Large Language Model Family

https://www.modelscope.cn/studios/damo/mPLUG-Owl

MIT License

2.25k stars 171 forks source link

Question about running inference #237

Open SYuan03 opened 1 month ago

SYuan03 commented 1 month ago

Thank you for the work you've done—it's really awesome!

I'm trying to reproduce your results on some datasets and I'm noticing some differences in accuracy, I'm using AutoModelForCausalLM when loading the model and I'm not sure if this will have an impact on the accuracy?

model = AutoModelForCausalLM.from_pretrained(
            model_path,
            attn_implementation='flash_attention_2',
            torch_dtype=torch.half,
            trust_remote_code=True
        )

LukeForeverYoung commented 1 month ago

Hi, the accuracy of most large language models can be affected by the prompt. We will release the evaluation pipeline in the coming days for easy reproducibility of the results. For a specific task, can you provide more information about your evaluation process, such as the prompts and processors used?

SYuan03 commented 1 month ago

Hi, the accuracy of most large language models can be affected by the prompt. We will release the evaluation pipeline in the coming days for easy reproducibility of the results. For a specific task, can you provide more information about your evaluation process, such as the prompts and processors used?

Thanks for the reply. I'm a developer for the project VLMEvalKit. We're trying to add mPLUG-Owl3 into the project. I have submitted a PR to support the model, but the accuracy on e.g. the MMBench dataset fails to meet expectations. Here is my code.

We'd appreciate it if you'd mention PR yourself in support! But it will take you some time to read the developer's manual. So if it's convenient for you, you can see what's wrong with my code. Thanks a lot.

chancharikmitra commented 1 month ago

Side point regarding this question. Can we get a confirmation from the authors that what @SYuan03 showed is the proper way to load the model through huggingface? This is because the README does not use AutoModelforCausalLM to load the model for some reason. Would appreciate confirmation on that point.

SYuan03 commented 1 month ago

Side point regarding this question. Can we get a confirmation from the authors that what @SYuan03 showed is the proper way to load the model through huggingface? This is because the README does not use AutoModelforCausalLM to load the model for some reason. Would appreciate confirmation on that point.

Hi，thank you for your attention. I actually noticed this, and I tried loading the model two ways, but they were consistent in my tests.

LukeForeverYoung commented 1 week ago

Hi, the accuracy of most large language models can be affected by the prompt. We will release the evaluation pipeline in the coming days for easy reproducibility of the results. For a specific task, can you provide more information about your evaluation process, such as the prompts and processors used?

Thanks for the reply. I'm a developer for the project VLMEvalKit. We're trying to add mPLUG-Owl3 into the project. I have submitted a PR to support the model, but the accuracy on e.g. the MMBench dataset fails to meet expectations. Here is my code.

We'd appreciate it if you'd mention PR yourself in support! But it will take you some time to read the developer's manual. So if it's convenient for you, you can see what's wrong with my code. Thanks a lot.

Thank you for supporting our models! We have recently released the evaluation pipelines at this link to help you reproduce the evaluation results.