-
Any plans on releasing the DPO code, or a brief intro of how you conducted long-context DPO?
-
-
Traceback (most recent call last):
File "/root/work/GC/model-editing-ft/run.py", line 89, in
main(args)
File "/root/work/GC/model-editing-ft/run.py", line 68, in main
trainer.train()
…
-
trl/trainer/dpo_trainer.py line 542
The tokenizer for _super().init ()_ should be set to _self.tokenizer_ instead of _tokenizer_, otherwise the previous _is_vision_model_ will be invalid.
-
**Describe the bug**
![image](https://github.com/user-attachments/assets/bc125f23-b4e3-4786-a062-684944e42140)
**Additional context**
SIZE_FACTOR=8 MAX_PIXELS=602112 torchrun --nproc_per_node …
-
To save compute.
Another hard issue :)
-
Made a few typography updates per recent DPO decisions. Tokens will be available once the latest figma tokens are pulled in via [#87](https://github.com/patternfly/design-tokens/issues/87). Decided to…
-
I see that the paper says that the Annotator can be adjusted through prompt. But the implementation of trl is score. Is this different from the paper?
-
In the RLHF workflow paper, the Reward Model is used to annotate new data generated by the LLM during the iterative DPO process, resulting in scalar values. According to Algorithm 1, the traditional R…
-
Hello! I'm interested in your work, and I want to do some work based on yours.
Now I have existed generated-images ,they are divided into prefered and unprefered. How can I train diffusion model wit…