eric-mitchell direct-preference-optimization issues

eric-mitchell / direct-preference-optimization

Reference implementation for DPO (Direct Preference Optimization)

Apache License 2.0

2.18k stars 180 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

When trying to reproduce the complete example, "NotImplementedError: offload_to_cpu=True and NO_SHARD is not supported yet" is thrown

#91 ZSvedic opened 5 days ago
0
ValueError when using peft on FSDPTrainer

#90 AragornHorse opened 1 week ago
0
In DPO training, I got this ‘train stats after 160768 examples: {'rewards_train/chosen': 'nan', 'rewards_train/rejected': 'nan', 'rewards_train/accuracies': '0', 'rewards_train/margins': 'nan', 'l -ogps_train/rejected': 'nan', 'logps_train/chosen': 'nan', 'loss/train': 'nan', 'examples_per_second': '5.4876', 'grad_norm': 'nan', 'counters/examples': 160768, 'counters/up -dates': 5024}’

#89 Alan-D-Chen opened 1 month ago
2
GPT4 prompt when evaluating DPO

#88 kygguo opened 2 months ago
0
How to gurantee the output.logits.shape[:-1] == labels.shape

#87 foreverhell opened 3 months ago
0
update extract_anthropic_prompt

#86 ZhiyuLi-goog opened 3 months ago
0
Training process got stuck when loss=dpo sample_during_eval=true trainer=FSDPTrainer

#85 kygguo closed 3 months ago
1
AB test training update and initial metrics on TruthfulQA

#84 lesnikow closed 5 months ago
0
How are evals done on trained models?

#83 lesnikow opened 5 months ago
0
Hi @eric-mitchell ,

#82 Gryff1ndor closed 2 months ago
3
where is config document of ipo?

#81 3244we opened 6 months ago
1
Using Mistral 7B with transformers v4.38.1 on MATH dataset, and facing memory leaks

#80 Jayant1234 opened 6 months ago
0
Initial commit src/data.py

#79 lesnikow closed 6 months ago
0
Division by Zero error sporadically occurs

#78 Jayant1234 closed 6 months ago
1
Weird logits and model starts degeneration while training DPO

#77 DungNasSa10 opened 7 months ago
2
Was it your intention to recreate wandb tables in iterator?

#76 huskydoge opened 7 months ago
0
Can DPO work on BERT-style Model?

#75 Leo-T-Zang opened 7 months ago
0
Hyperparameter experiments

#74 j-c-carr closed 8 months ago
1
The number of training steps in the SHP dataset

#73 bonin147 opened 8 months ago
0
Computing faster lopgs

#72 alexvishnevskiy opened 8 months ago
3
Implementation for Plackett-Luce rank model

#71 rohan598 opened 8 months ago
1
What's the reference policy of Preferred-FT in Figure 2?

#70 zetian1025 opened 8 months ago
0
My Code to Reproduce IMDB

#69 QiyaoWei opened 8 months ago
0
Why does SFT sum the cross-entropy loss within each sequence?

#68 YJWon99 opened 9 months ago
1
Using cross entropy loss to calculate DPO?

#67 zachares opened 9 months ago
2
Unable to Run SFT

#66 Rui-Yuan91 opened 9 months ago
3
Bug in loading Llama tokenizer?

#65 ajyl closed 9 months ago
1
Question bout IPO loss vs DPO loss

#64 MoonBlvd opened 9 months ago
1
Appendix A.4 of the papper: the derived gradient is not consistent with the main text

#63 yflyzhang closed 9 months ago
2
Reproducing Win Rate inference for TL;DR

#62 jdchang1 opened 10 months ago
1
Question regarding the logits in the `_get_batch_logps` function

#61 vgoklani closed 10 months ago
2
Possible Inconsistency(Possibly Typo) in Gradient Definition Between Eq. 7 and Appendix A.4 in DPO paper

#60 rustic-snob closed 10 months ago
1
Unable to run the code for Step 2: Run SFT

#59 ppsmk388 closed 10 months ago
1
Question about fine tuning steps(epoch)

#58 gyuwon12 closed 10 months ago
2
Question about _get_batch_logps of trainers.py

#57 wulaoshi closed 10 months ago
3
DPO did not achieve the expected experimental effect

#56 Vance0124 opened 11 months ago
2
Training cost: RLHF vs DPO

#55 kartheekmedathati closed 10 months ago
1
How to re-implement the result of IMDB sentiment generation.

#54 junkangwu opened 1 year ago
0
Llama-2-13b-chat Valid reward accuracy remains ~50%

#53 nxphi47 opened 1 year ago
0
Qwen model issues & embedding and loss has nan

#52 lylcst opened 1 year ago
5
error when following the readme to train sft on multiple cards using FSDPTrainer

#51 NekoMimiUnagi opened 1 year ago
2
Pythia2.8B model weights

#50 alexv-cerebras closed 1 year ago
2
How to load trained model for inference?

#49 VibhuAg closed 11 months ago
2
Question about average_log_prob

#48 LSX-Sneakerprogrammer opened 1 year ago
9
No such file or directory: json-train-00000-00000-of-NNNNN.arrow

#47 qingerVT opened 1 year ago
1
Loss is 0 when policy and reference models are the same

#46 luffycodes closed 1 year ago
3
Questions about the IMDB Sentiment dataset

#45 stevie1023 opened 1 year ago
3
Questions about the average_log_prob parameter

#44 liumingzhu6060 opened 1 year ago
0
Is fine tuning with e.g., LORA supported?

#43 Emerald01 opened 1 year ago
1
llama7B issue

#42 JiuhaiChen opened 1 year ago
15