issues
search
YeonwooSung
/
ai_book
AI book for everyone
24
stars
5
forks
source link
Mistral-7b, Zephyr-7b-alpha
#52
Open
YeonwooSung
opened
11 months ago
YeonwooSung
commented
11 months ago
Mistral-7b-v0.1
,
Zephyr-7b-alpha
Mistral-7b outperformed Llama2-13b-hf and gpt-3.5-turbo
Zephyr-7b-alpha outperformed mistral-7b, and beat Llama2-70b
DPO vs PPO (DPO is better for finetuning?)
Zephyr-7b-alpha is a finetuned model of the Mistral-7b with DPO trainer.
Uses subset of
UltraFeedback
HuggingFace team found that PPO is fragile with hyperparameters, while DPO is robust for hyperparameters
YeonwooSung
commented
11 months ago
source code for mistral llm
Mistral-7b-v0.1, Zephyr-7b-alpha
DPO vs PPO (DPO is better for finetuning?)