rlaif Search Results - Githubissues

56 results
for rlaif

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

huggingface/trl #1964

Setting the `dataset_num_proc>1` process on `DPOTrainer` see…

When I set up just the `dataset_num_proc=2` process in `DPOTrainer`, it seemed to completely pause the step of initializing the trainer map dataset, even though my dataset only had two data points and…

zhangzef updated 2 weeks ago
12
modelscope/ms-swift #1878

ImportError: cannot import name 'LlavaOnevisionForConditiona…

**Describe the bug** I am trying to SFT fine-tune the model `llava-onevision-qwen2-0_5b-ov` using the following command: ``` swift sft \ --model_type llava-onevision-qwen2-0_5b-ov \ --dat…

Lopa07 updated 1 week ago
3
RLHF-V/RLAIF-V #13

Error when loading datasets split

Thanks for your wonderful work. When I tried to load the dataset, an error occurred. However, the data extracting process goes well. How to fix it? OSError: Cannot find data file. Original erro…

Xuchen-Li updated 1 month ago
1
OpenAdaptAI/OpenAdapt #393

Implement Reinforcement Learning with Inhuman Feedback

### Feature request https://github.com/CarperAI/trlx ### Motivation https://twitter.com/i/web/status/1668337702440165376 ![image](https://github.com/OpenAdaptAI/OpenAdapt/assets/774615/d04…

abrichr updated 1 year ago
7
OpenLMLab/MOSS-RLHF #43

关于rm模型训练策略与损失函数

首先恭喜获得best paper！！！我这面有个疑问，我想试验一下论文中label smooth这块，但是在代码中没有发现有关label smooth的损失修改，另外也没有发现任何关于损失添加margin的代码，请问这块是没有release出来吗？

tonylin52 updated 1 week ago
12
modelscope/ms-swift #2012

[Re-appeared] DPO training error UnboundLocalError: local va…

**This bug has re-appeared in the latest ms-swift version.** This bug was initially reported in [this issue](https://github.com/modelscope/ms-swift/issues/1734), and was solved promptly. Now, with th…

Lopa07 updated 2 days ago
1
PKU-Alignment/align-anything #40

[Question] LLaVA DPO training loss increases

### Required prerequisites - [X] I have read the documentation . - [X] I have searched the [Issue Tracker](https://github.com/PKU-Alignment/align-anything/issues) and [Discussions](https://github.com…

fangqi-Zhu updated 2 days ago
4
OpenBMB/MiniCPM-V #300

[Data info] MiniCPM-llama3-V 2.5

Hello, and Thanks for the amazing work! Very much appreciated :) I have read the [MiniCPM-V 2 blog](https://openbmb.vercel.app/minicpm-v-2-en) and the sources you cite in the blog post, as well as …

emanuelevivoli updated 1 week ago
1
pytorch/pytorch #118369

NCCL watchdog thread terminated with exception: [Rank 7] Wat…

### 🐛 Describe the bug I got the following error when running supervised fine tuning of LLAMA2-7b model on P4 instance. Command that I ran: `deepspeed trainer_sft.py --configs llama2-7b-sft-RLAIF…

tamanna-mostafa updated 6 months ago
7
unslothai/unsloth #320

Lora downcasting issue

When creating a PEFT model and then trying to train it, we get an error; ``` File "/scratch/gpfs/ashwinee/unsloth/unsloth/kernels/fast_lora.py", line 106, in backward d_do…

kiddyboots216 updated 1 month ago
18

上一页 1...1 2 3 4 5 6...6 下一页

56 results for rlaif

56 results
for rlaif