For each method, there are seven models available: pythia-{1.4, 2.8, 6.9, 12.0}B and llama-{7, 13, 30}B, all of which have been aligned under nearly identical settings on {Anthropic HH, Open Assistant, SHP 1.0} data.
The implied reward for both DPO- and KTO-aligned models is $\beta \log \frac{\pi\theta(y|x)}{\pi\text{ref}(y|x)}$, where $\pi_\text{ref}$ is the reference model
The reference model for each set of models in Archangel is as follows:
for the SFT+DPO model ContextualAI/archangel_sft-dpo_{model}, the reference is ContextualAI/archangel_sft_{model}
for the SFT+KTO model ContextualAI/archangel_sft-kto_{model}, the reference is ContextualAI/archangel_sft_{model}
for the DPO model w/o SFT ContextualAI/archangel_dpo_llama7b, the reference is huggyllama/llama-7b, which can be found in the _name_or_path field in config.json
for the KTO model w/o SFT ContextualAI/archangel_kto_llama7b, the reference is huggyllama/llama-7b, which can be found in the _name_or_path field in config.json
The Archangel suite of models contain DPO, SFT+DPO, KTO, SFT+KTO models which can also be used as reward models: https://huggingface.co/collections/ContextualAI/archangel-65bd45029fa020161b052430
For each method, there are seven models available: pythia-{1.4, 2.8, 6.9, 12.0}B and llama-{7, 13, 30}B, all of which have been aligned under nearly identical settings on {Anthropic HH, Open Assistant, SHP 1.0} data.
The implied reward for both DPO- and KTO-aligned models is $\beta \log \frac{\pi\theta(y|x)}{\pi\text{ref}(y|x)}$, where $\pi_\text{ref}$ is the reference model
The reference model for each set of models in Archangel is as follows:
ContextualAI/archangel_sft-dpo_{model}
, the reference isContextualAI/archangel_sft_{model}
ContextualAI/archangel_sft-kto_{model}
, the reference isContextualAI/archangel_sft_{model}
ContextualAI/archangel_dpo_llama7b
, the reference ishuggyllama/llama-7b
, which can be found in the_name_or_path
field inconfig.json
ContextualAI/archangel_kto_llama7b
, the reference ishuggyllama/llama-7b
, which can be found in the_name_or_path
field inconfig.json