Lichang-Chen / AlpaGasus

A better Alpaca Model Trained with Less Data (only 9k instructions of the original set)
https://lichang-chen.github.io/AlpaGasus/
19 stars 3 forks source link

Some questions about the setups of grader #2

Closed gauss5930 closed 1 year ago

gauss5930 commented 1 year ago

First, I want to convey the message that AlpaGasus inspired me a lot. Thank you for introducing wonderful research!!

At the moment, I am progressing with the project to make the QLoRA version of AlpaGasus, the evaluation of our model only has left. However, I have some questions about proceeding with the evaluation with GPT-4, so I leave the message in Issues!

As you see in the title of this issues, the content of the question is related to the setup of grader grading the response of models. In the paper, as far as I know, there are not any specific setups of grader such as temperature, top_p, and max_tokens. For better comprehension and research, I want to know the specific setups of grader(GPT-4)!! Could you let me know them??

Lichang-Chen commented 1 year ago

Thanks for your interest! Here is the setup for the grader(GPT-4):

  1. temperature=0.0
  2. top_p=1.0
  3. max-token=256
  4. Use the engine: GPT-4-0613

If you have any other questions, please let me know! BTW: when your Qlora version of Alpagasus is ready, please let me know. I would consider citing your repo on the Alpagasus homepage if it works well!

gauss5930 commented 1 year ago

Thank you for your kind reply, and thank you for saying that you would cite me if I implemented the model using QLoRA!! After proceeding with evaluation and solving minor issues, I'll let you know about my repository.

By the way, I had a question again while writing the code to output the response of the model to evaluate, so I am asking again. There are several test data to output the response of the models, then what prompt did you use? The prompt of each test set? or another prompt?

Lichang-Chen commented 1 year ago

Please refer to another unofficial AlpaGasus repo. I checked their code and they did the evaluation quite well.