dpo Search Results - Githubissues

THUDM/LongWriter #2

DPO code

Any plans on releasing the DPO code, or a brief intro of how you conducted long-context DPO?

HaoshengZou updated 2 weeks ago

opea-project/GenAIExamples #756

DPO fine-tuning

kevinintel updated 2 weeks ago

au-revoir/model-editing-ft #4

name 'dpo_inputs' is not defined

Traceback (most recent call last): File "/root/work/GC/model-editing-ft/run.py", line 89, in main(args) File "/root/work/GC/model-editing-ft/run.py", line 68, in main trainer.train() …

Mint-hfut updated 5 hours ago

huggingface/trl #1972

VLM dpo bug

trl/trainer/dpo_trainer.py line 542 The tokenizer for _super().init ()_ should be set to _self.tokenizer_ instead of _tokenizer_, otherwise the previous _is_vision_model_ will be invalid.

liuchaohu updated 1 month ago

modelscope/ms-swift #2108

qwen2 audio dpo微调报错

**Describe the bug** ![image](https://github.com/user-attachments/assets/bc125f23-b4e3-4786-a062-684944e42140) **Additional context** SIZE_FACTOR=8 MAX_PIXELS=602112 torchrun --nproc_per_node …

zhangfan-algo updated 6 days ago

allenai/open-instruct #213

Precompute DPO logprobs

To save compute. Another hard issue :)

natolambert updated 1 month ago

patternfly/patternfly #7072

DPO Typography Updates -- Content/Title Component updates?

Made a few typography updates per recent DPO decisions. Tokens will be available once the latest figma tokens are pulled in via [#87](https://github.com/patternfly/design-tokens/issues/87). Decided to…

lboehling updated 1 week ago

huggingface/trl #2018

Different with online dpo papers

I see that the paper says that the Annotator can be adjusted through prompt. But the implementation of trl is score. Is this different from the paper?

mst272 updated 3 weeks ago

RLHFlow/RLHF-Reward-Modeling #33

Clarification on Reward Usage in DPO Training

In the RLHF workflow paper, the Reward Model is used to annotate new data generated by the LLM during the iterative DPO process, resulting in scalar values. According to Algorithm 1, the traditional R…

vincezh2000 updated 1 week ago

yk7333/d3po #12

DPO with existed-images

Hello! I'm interested in your work, and I want to do some work based on yours. Now I have existed generated-images ,they are divided into prefered and unprefered. How can I train diffusion model wit…

guoyanan1g updated 3 weeks ago

1000+ results for dpo

1000+ results
for dpo