Mistral-7b, Zephyr-7b-alpha - Githubissues

YeonwooSung / ai_book

AI book for everyone

24 stars 5 forks source link

Mistral-7b, Zephyr-7b-alpha #52

Open YeonwooSung opened 11 months ago

YeonwooSung commented 11 months ago

Mistral-7b-v0.1, Zephyr-7b-alpha

Mistral-7b outperformed Llama2-13b-hf and gpt-3.5-turbo
Zephyr-7b-alpha outperformed mistral-7b, and beat Llama2-70b

DPO vs PPO (DPO is better for finetuning?)

Zephyr-7b-alpha is a finetuned model of the Mistral-7b with DPO trainer.
- Uses subset of UltraFeedback
HuggingFace team found that PPO is fragile with hyperparameters, while DPO is robust for hyperparameters

YeonwooSung commented 11 months ago

source code for mistral llm