safe-rlhf Search Results

145 results
for safe-rlhf

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

modelscope/ms-swift #2407

强化学习--rlhf_type 使用kto、 rm、 ppo时报错

安装版本： ![image](https://github.com/user-attachments/assets/9f1c8790-11e2-4452-ad4c-382f72d87cfb) ![image](https://github.com/user-attachments/assets/13d5c64a-e45b-4e82-a6cc-cd2beb709e92) 问题描述： …

BigFishLi updated 2 weeks ago
6
CATcher-testbed/alpha10-dev-response #72

This is just a normal bug

Bad documenttaion. not very long errors Detecting toxicity in outputs generated by Large Language Models (LLMs) is crucial for ensuring that these models produce safe, respectful, and appropriate con…

nus-pe-bot updated 1 month ago
1
tankh99/alpha10 #3

This is just a normal bug

Bad documenttaion. not very long errors Detecting toxicity in outputs generated by Large Language Models (LLMs) is crucial for ensuring that these models produce safe, respectful, and appropriate con…

tankh99 updated 1 month ago
1
microsoft/DeepSpeedExamples #653

ds_eval_config v.s. ds_config

when initializing reward and ref models in step 3 of deepspeed-chat, there are two kinds of deepspeed config files are used, i.e. ds_config and ds_eval_config. May I ask why we need to use two configs…

SenZHANG-GitHub updated 1 year ago
2
PKU-Alignment/safe-rlhf #175

[BUG] using gpt2 model results in nan metrics and !!! output

### Required prerequisites - [X] I have read the documentation . - [X] I have searched the [Issue Tracker](https://github.com/PKU-Alignment/safe-rlhf/issues) and [Discussions](https://github.com/PKU-…

qwenzo updated 7 months ago
2
ludwig-ai/ludwig #3996

Issue fien tuning Falcon

Describe the bug I am trying to finetune tiiuae/falcon-7b-instruct and I am getting this error. `TypeError: where(): argument 'condition' (position 1) must be Tensor, not bool` **To Reproduce**…

vinven7 updated 1 month ago
2
microsoft/DeepSpeedExamples #337

DeepSpeed-Chat: prefetch of layers during reward model forwa…

When running step 3 with ZERO stage 3 enabled for both the actor and critic models, I get the following error (line numbers may be offset due to debug statements I've added): ``` File "/path/DeepSp…

adammoody updated 1 year ago
27
kibitzing/awesome-llm-data #2

LLaMa 2 Fine-tuning data

### SFT data 1. Started the SFT stage with publicly available instruction tuning data ([Chung et al., 2022](https://arxiv.org/pdf/2210.11416)) 2. Fewer but high quality > Millions of data but low …

kibitzing updated 5 months ago
1
PKU-Alignment/safe-rlhf #181

Failing to train cost model (ValueError: The safer answer is…

### Bug Report I have tried to reproduce the results on my own using Llama 3.1 8b. I can successfully run the SFT and Reward models trainers. But, the cost model trainer consistently crashes. …

cemiu updated 2 months ago
5
thu-ml/tianshou #1215

Towards Roadmap

This issue serves for informing about and discussing the next major release of Tianshou, after which the library can be considered mature and stable from our perspective. The progress and the related …

MischaPanch updated 2 months ago
11

上一页 1...1 2 3 4 5 6 7...15 下一页

145 results for safe-rlhf

145 results
for safe-rlhf