Evaluating on BBH/MMLU using some of the FT checkpoints from the model-zoo.

Alpha-VLLM / LLaMA2-Accessory

An Open-source Toolkit for LLM Development

https://llama2-accessory.readthedocs.io/

Other

2.73k stars 176 forks source link

Evaluating on BBH/MMLU using some of the FT checkpoints from the model-zoo. #81

Closed prateeky2806 closed 1 year ago

prateeky2806 commented 1 year ago

Hi, thank you for creating this amazing repository. I have two questions.

I am trying to use some of the existing models listed in the model-zoo and run evaluations on them. However, I am not able to figure out how this can be done. Can you provide a script that can do this?
I am wondering if you have some QLora Finetuned checkpoints that are also provided by you. I want to load these QLora modules and then perform BBH/MMLU evaluation on the model.

It would be great if you can help with this.

Thanks, Prateek

kriskrisliu commented 1 year ago

Hi there,

Thank you for your response. I've reviewed the information you provided, and everything seems to be in order.

You can find the evaluation guidance on this page.
Any of the checkpoints in the model zoo can be loaded with the quantization method by using this function. I noticed that the evaluation team was not involved in the quantization implementation. I plan to submit a pull request this weekend to address this.

If you have any further questions or concerns, please feel free to let me know.

kriskrisliu commented 1 year ago

I've checked the code and revisited the BBH implementation. Other evaluation codes are in the revisiting progress. If you would like to evaluate on BBH, the running script could be like the following. In this case, it initializes a 7b llama2 model, and loads official llama2 weights as well as PEFT weights (3.35MB, alpaca_llamaPeft_normBias). Since the --quant flag is activated, the base model will be quantized to 4bit by using QLoRA implementation, while the PEFT weights are stayed in higher precision(fp32/fp16/bp16).

cd LLaMA2-Accessory/light-eval
torchrun --nproc-per-node=1 --master_port 23456 src/eval_bbh.py \
--llama_type llama_peft \
--llama_config <path-to>/Llama-2-7b/params.json \
--tokenizer_path <path-to>/Llama-2-7b/tokenizer.model \
--pretrained_path <path-to>/Llama-2-7b/   <path-to>/alpaca_llamaPeft_normBias \
--data_dir data/BIG-Bench-Hard \
--quant

prateeky2806 commented 1 year ago

Thank you so much! I was wondering if you could provide a way to find out which checkpoints are Lora/Qlora checkpoints and which are fully finetuned models so that it is easier for people to select the correct models from the model zoo.

ChrisLiu6 commented 1 year ago

Thank you so much! I was wondering if you could provide a way to find out which checkpoints are Lora/Qlora checkpoints and which are fully finetuned models so that it is easier for people to select the correct models from the model zoo.

Thank you for your suggestion. In fact, we have been doing so. Parameter efficient fine-tuning checkpoints are generally labeled with "peft" or "llamaAdapter", e.g. alpacaLlava_llamaQformerv2Peft_13b and alpaca_llamaPeft_normBias, and those without special labels are full-parameter fine-tune methods