issues
search
eric-mitchell
/
direct-preference-optimization
Reference implementation for DPO (Direct Preference Optimization)
Apache License 2.0
2.18k
stars
180
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
When trying to reproduce the complete example, "NotImplementedError: offload_to_cpu=True and NO_SHARD is not supported yet" is thrown
#91
ZSvedic
opened
5 days ago
0
ValueError when using peft on FSDPTrainer
#90
AragornHorse
opened
1 week ago
0
In DPO training, I got this ‘train stats after 160768 examples: {'rewards_train/chosen': 'nan', 'rewards_train/rejected': 'nan', 'rewards_train/accuracies': '0', 'rewards_train/margins': 'nan', 'l -ogps_train/rejected': 'nan', 'logps_train/chosen': 'nan', 'loss/train': 'nan', 'examples_per_second': '5.4876', 'grad_norm': 'nan', 'counters/examples': 160768, 'counters/up -dates': 5024}’
#89
Alan-D-Chen
opened
1 month ago
2
GPT4 prompt when evaluating DPO
#88
kygguo
opened
2 months ago
0
How to gurantee the output.logits.shape[:-1] == labels.shape
#87
foreverhell
opened
3 months ago
0
update extract_anthropic_prompt
#86
ZhiyuLi-goog
opened
3 months ago
0
Training process got stuck when loss=dpo sample_during_eval=true trainer=FSDPTrainer
#85
kygguo
closed
3 months ago
1
AB test training update and initial metrics on TruthfulQA
#84
lesnikow
closed
5 months ago
0
How are evals done on trained models?
#83
lesnikow
opened
5 months ago
0
Hi @eric-mitchell ,
#82
Gryff1ndor
closed
2 months ago
3
where is config document of ipo?
#81
3244we
opened
6 months ago
1
Using Mistral 7B with transformers v4.38.1 on MATH dataset, and facing memory leaks
#80
Jayant1234
opened
6 months ago
0
Initial commit src/data.py
#79
lesnikow
closed
6 months ago
0
Division by Zero error sporadically occurs
#78
Jayant1234
closed
6 months ago
1
Weird logits and model starts degeneration while training DPO
#77
DungNasSa10
opened
7 months ago
2
Was it your intention to recreate wandb tables in iterator?
#76
huskydoge
opened
7 months ago
0
Can DPO work on BERT-style Model?
#75
Leo-T-Zang
opened
7 months ago
0
Hyperparameter experiments
#74
j-c-carr
closed
8 months ago
1
The number of training steps in the SHP dataset
#73
bonin147
opened
8 months ago
0
Computing faster lopgs
#72
alexvishnevskiy
opened
8 months ago
3
Implementation for Plackett-Luce rank model
#71
rohan598
opened
8 months ago
1
What's the reference policy of Preferred-FT in Figure 2?
#70
zetian1025
opened
8 months ago
0
My Code to Reproduce IMDB
#69
QiyaoWei
opened
8 months ago
0
Why does SFT sum the cross-entropy loss within each sequence?
#68
YJWon99
opened
9 months ago
1
Using cross entropy loss to calculate DPO?
#67
zachares
opened
9 months ago
2
Unable to Run SFT
#66
Rui-Yuan91
opened
9 months ago
3
Bug in loading Llama tokenizer?
#65
ajyl
closed
9 months ago
1
Question bout IPO loss vs DPO loss
#64
MoonBlvd
opened
9 months ago
1
Appendix A.4 of the papper: the derived gradient is not consistent with the main text
#63
yflyzhang
closed
9 months ago
2
Reproducing Win Rate inference for TL;DR
#62
jdchang1
opened
10 months ago
1
Question regarding the logits in the `_get_batch_logps` function
#61
vgoklani
closed
10 months ago
2
Possible Inconsistency(Possibly Typo) in Gradient Definition Between Eq. 7 and Appendix A.4 in DPO paper
#60
rustic-snob
closed
10 months ago
1
Unable to run the code for Step 2: Run SFT
#59
ppsmk388
closed
10 months ago
1
Question about fine tuning steps(epoch)
#58
gyuwon12
closed
10 months ago
2
Question about _get_batch_logps of trainers.py
#57
wulaoshi
closed
10 months ago
3
DPO did not achieve the expected experimental effect
#56
Vance0124
opened
11 months ago
2
Training cost: RLHF vs DPO
#55
kartheekmedathati
closed
10 months ago
1
How to re-implement the result of IMDB sentiment generation.
#54
junkangwu
opened
1 year ago
0
Llama-2-13b-chat Valid reward accuracy remains ~50%
#53
nxphi47
opened
1 year ago
0
Qwen model issues & embedding and loss has nan
#52
lylcst
opened
1 year ago
5
error when following the readme to train sft on multiple cards using FSDPTrainer
#51
NekoMimiUnagi
opened
1 year ago
2
Pythia2.8B model weights
#50
alexv-cerebras
closed
1 year ago
2
How to load trained model for inference?
#49
VibhuAg
closed
11 months ago
2
Question about average_log_prob
#48
LSX-Sneakerprogrammer
opened
1 year ago
9
No such file or directory: json-train-00000-00000-of-NNNNN.arrow
#47
qingerVT
opened
1 year ago
1
Loss is 0 when policy and reference models are the same
#46
luffycodes
closed
1 year ago
3
Questions about the IMDB Sentiment dataset
#45
stevie1023
opened
1 year ago
3
Questions about the average_log_prob parameter
#44
liumingzhu6060
opened
1 year ago
0
Is fine tuning with e.g., LORA supported?
#43
Emerald01
opened
1 year ago
1
llama7B issue
#42
JiuhaiChen
opened
1 year ago
15
Next