Hi, may I know why the hyperparameters of the training command in Llama-x (this repo) and Alpaca are different. Eg., the batch size 128 vs. 512 (64*8), the warmup steps 0.03 (ratio) vs. 2.
Which hyperparameter should we adopt?
Another question is what is the Llama-i (7B) in the Llama-X Evaluation section? And the GSM8K result is 18.8% while my own LLAMA-X model (using the hyperparamters in this repo) is only 10%. Not sure why the gap is so large. Would you mind sharing your evaluation script on GSM8K in Llama-X? Thank you.
Hi, may I know why the hyperparameters of the training command in
Llama-x (this repo)
and Alpaca are different. Eg., thebatch size 128 vs. 512 (64*8)
, thewarmup steps 0.03 (ratio) vs. 2
. Which hyperparameter should we adopt?Another question is what is the Llama-i (7B) in the
Llama-X Evaluation
section? And theGSM8K
result is18.8%
while my own LLAMA-X model (using the hyperparamters in this repo) is only10%
. Not sure why the gap is so large. Would you mind sharing your evaluation script onGSM8K
in Llama-X? Thank you.