Closed sunwhw closed 7 months ago
Thanks for your interest. The result is getting from instructblip-vicuna7b, and we use ChEF for evaluation. Run:
python eval.py --model_cfg=config/ChEF/models/instructblip.yaml --recipe_cfg=config/ChEF/scenario_recipes/ScienceQA/default.yaml
Thanks!I've pushed out some results! But I still want to confirm with you whether the leaderboard results are all based on "default.yaml" for the corresponding dataset? For example, for the “FSC“, the result is getting from "src/config/ChEF/scenario_recipes/FSC147/default.yaml"! Because I thought it should come from "src/config/ChEF/scenario_recipes/LAMM/FSC147.yaml" before, but the results were very different. Is there a detailed explanation of the source of results on leaderboard? If no, Could you please sync all the config details to leaderboard? Since the name of benchmark is "LAMM", I directly ran it under the "LAMM” folder before, but the results were very diferent from the leaderboards.
Thanks for your suggestion! The results on the leaderboardd are all based on the ChEF/scenario_recipes/xxx/default.yaml
. Also, for users who want to use LAMM benchmark for evaluation, we keep the origin evaluation configs in ChEF/scenario_recipes/LAMM/xxx.yaml
. To be noted, these evaluation settings are not recommanded, as we believe that ChEF provides a more fair and reasonable evaluation pipeline.
We understand there maybe something confused between the results on the leaderboard and different configs supported in our code. We will provide a more detailed explaination and clean the configs sooner or later. Thanks for your suggestion again.
Thanks for your work! I also get the result of 0.06593951412989589 when perform the instructblip-vicuna7b on the 2d-ScienceQA-LAMM. But when i changed the model arch to instructblip-flant5xxl, i can get the result of 0.66, why such a big difference? And, the instructblip's ScienceQA result on leaderboard is 0.5518 , so the result is geting from which one arch?