issues
search
eric-mitchell
/
direct-preference-optimization
Reference implementation for DPO (Direct Preference Optimization)
Apache License 2.0
2.06k
stars
167
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
In DPO training, I got this ‘train stats after 160768 examples: {'rewards_train/chosen': 'nan', 'rewards_train/rejected': 'nan', 'rewards_train/accuracies': '0', 'rewards_train/margins': 'nan', 'l -ogps_train/rejected': 'nan', 'logps_train/chosen': 'nan', 'loss/train': 'nan', 'examples_per_second': '5.4876', 'grad_norm': 'nan', 'counters/examples': 160768, 'counters/up -dates': 5024}’
#89
Alan-D-Chen
opened
1 week ago
2
GPT4 prompt when evaluating DPO
#88
kygguo
opened
4 weeks ago
0
How to gurantee the output.logits.shape[:-1] == labels.shape
#87
foreverhell
opened
1 month ago
0
update extract_anthropic_prompt
#86
ZhiyuLi-goog
opened
1 month ago
0
Training process got stuck when loss=dpo sample_during_eval=true trainer=FSDPTrainer
#85
kygguo
closed
2 months ago
1
AB test training update and initial metrics on TruthfulQA
#84
lesnikow
closed
4 months ago
0
How are evals done on trained models?
#83
lesnikow
opened
4 months ago
0
Hi @eric-mitchell ,
#82
Gryff1ndor
closed
3 weeks ago
3
where is config document of ipo?
#81
3244we
opened
5 months ago
1
Using Mistral 7B with transformers v4.38.1 on MATH dataset, and facing memory leaks
#80
Jayant1234
opened
5 months ago
0
Initial commit src/data.py
#79
lesnikow
closed
5 months ago
0
Division by Zero error sporadically occurs
#78
Jayant1234
closed
5 months ago
1
Weird logits and model starts degeneration while training DPO
#77
DungNasSa10
opened
5 months ago
2
Was it your intention to recreate wandb tables in iterator?
#76
huskydoge
opened
6 months ago
0
Can DPO work on BERT-style Model?
#75
Leo-T-Zang
opened
6 months ago
0
Hyperparameter experiments
#74
j-c-carr
closed
6 months ago
1
The number of training steps in the SHP dataset
#73
bonin147
opened
6 months ago
0
Computing faster lopgs
#72
alexvishnevskiy
opened
6 months ago
3
Implementation for Plackett-Luce rank model
#71
rohan598
opened
7 months ago
1
What's the reference policy of Preferred-FT in Figure 2?
#70
zetian1025
opened
7 months ago
0
My Code to Reproduce IMDB
#69
QiyaoWei
opened
7 months ago
0
Why does SFT sum the cross-entropy loss within each sequence?
#68
YJWon99
opened
7 months ago
1
Using cross entropy loss to calculate DPO?
#67
zachares
opened
7 months ago
2
Unable to Run SFT
#66
Rui-Yuan91
opened
7 months ago
3
Bug in loading Llama tokenizer?
#65
ajyl
closed
8 months ago
1
Question bout IPO loss vs DPO loss
#64
MoonBlvd
opened
8 months ago
1
Appendix A.4 of the papper: the derived gradient is not consistent with the main text
#63
yflyzhang
closed
8 months ago
2
Reproducing Win Rate inference for TL;DR
#62
jdchang1
opened
8 months ago
1
Question regarding the logits in the `_get_batch_logps` function
#61
vgoklani
closed
9 months ago
2
Possible Inconsistency(Possibly Typo) in Gradient Definition Between Eq. 7 and Appendix A.4 in DPO paper
#60
rustic-snob
closed
9 months ago
1
Unable to run the code for Step 2: Run SFT
#59
ppsmk388
closed
9 months ago
1
Question about fine tuning steps(epoch)
#58
gyuwon12
closed
9 months ago
2
Question about _get_batch_logps of trainers.py
#57
wulaoshi
closed
9 months ago
3
DPO did not achieve the expected experimental effect
#56
Vance0124
opened
10 months ago
2
Training cost: RLHF vs DPO
#55
kartheekmedathati
closed
9 months ago
1
How to re-implement the result of IMDB sentiment generation.
#54
junkangwu
opened
10 months ago
0
Llama-2-13b-chat Valid reward accuracy remains ~50%
#53
nxphi47
opened
11 months ago
0
Qwen model issues & embedding and loss has nan
#52
lylcst
opened
11 months ago
4
error when following the readme to train sft on multiple cards using FSDPTrainer
#51
NekoMimiUnagi
opened
11 months ago
1
Pythia2.8B model weights
#50
alexv-cerebras
closed
11 months ago
2
How to load trained model for inference?
#49
VibhuAg
closed
10 months ago
2
Question about average_log_prob
#48
LSX-Sneakerprogrammer
opened
11 months ago
9
No such file or directory: json-train-00000-00000-of-NNNNN.arrow
#47
qingerVT
opened
11 months ago
1
Loss is 0 when policy and reference models are the same
#46
luffycodes
closed
11 months ago
3
Questions about the IMDB Sentiment dataset
#45
stevie1023
opened
11 months ago
3
Questions about the average_log_prob parameter
#44
liumingzhu6060
opened
12 months ago
0
Is fine tuning with e.g., LORA supported?
#43
Emerald01
opened
1 year ago
1
llama7B issue
#42
JiuhaiChen
opened
1 year ago
15
Strange loss pattern
#41
puyuanOT
opened
1 year ago
0
Question about average_log_prob
#40
Kyeongpil
opened
1 year ago
5
Next