About Llama-X and Alpaca repo

Hi, may I know why the hyperparameters of the training command in Llama-x (this repo) and Alpaca are different. Eg., the batch size 128 vs. 512 (64*8), the warmup steps 0.03 (ratio) vs. 2. Which hyperparameter should we adopt?

Another question is what is the Llama-i (7B) in the Llama-X Evaluation section? And the GSM8K result is 18.8% while my own LLAMA-X model (using the hyperparamters in this repo) is only 10%. Not sure why the gap is so large. Would you mind sharing your evaluation script on GSM8K in Llama-X? Thank you.

AetherCortex / Llama-X

About Llama-X and Alpaca repo #20