reward-modeling Search Results

639 results
for reward-modeling

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

microsoft/DeepSpeedExamples #407

Trained BLOOM model saved differently comparing to OPT model

Hi, I following the script to train the bloom model for my own dataset. However, I found that it saved the model differently as compared to other models such as OPT. The screenshot below shows the sav…

alibabadoufu updated 1 year ago
6
CarperAI/trlx #365

how to use trlx_inference_gptj.py to infer own trained model

### 🚀 The feature, motivation, and pitch Thank you for your great work. I used scipts in summarize_rlhf train own ppo model(gpt-2) and tried to do inference. After converting trained model to bin f…

Bo396543018 updated 1 year ago
4
ersilia-os/ersilia #872

✍️ Contribution period: <Luis_Camacho>

### Week 1 - Get to know the community - [X] Join the communication channels - [X] Open a GitHub issue (this one!) - [X] Install the Ersilia Model Hub and test the simplest model - [X] Write a motiva…

luiscamachocaballero updated 1 year ago
10
bitsandbytes-foundation/bitsandbytes #589

H2OGPT python -m bitsandbytes bug

Hello, trying to figure out why my h2ogpt doesn't use my GPU at all. Figured that something has to be wrong with bitsandbytes, since it says it was compiled without GPU support. I made everything work…

Frub3L updated 10 months ago
12
CarperAI/trlx #373

PPO training fails with NCCL timeout when running on larger …

Hello, I have successfully run the code summarize_rlhf with small SFT and RM models (bloom1b). However, when I try to run the larger model (7B), **the timeout error is raised,** which is a similar …

agave233 updated 1 year ago
8
kevslinger/DTQN #4

Question about transformer with DQN

Hi, Kev It's glad to know your work about DTQN. I am very curious about why the work of combine Transformer and DQN is very small ,and this two technology is emit very early. Because I thought ther…

8JasonStatham8 updated 1 year ago
6
chainreactors/picker #274

[每日信息流] 2023-06-01

# 每日安全资讯（2023-06-01） - Files ≈ Packet Storm - [ ] [Qualcomm Adreno/KGSL Data Leakage](https://packetstormsecurity.com/files/172664/GS20230531163517.txt) - [ ] [Qualcomm Adreno/KGSL Unchecked Cast…

chainreactorbot updated 8 months ago
1
CarperAI/trlx #416

ppo_sentiments_llama position_ids error

### 🐛 Describe the bug Hi! I tried to run the ppo_sentiments_llama example but got the error below. ``` ╭─────────────────────────────── Traceback (most recent call last) ─────────────────────────…

TonyZhanghm updated 1 year ago
2
PKU-Alignment/safe-rlhf #9

[BUG] 运行 PPO 阶段时，出现错误：CUDA error: device-side assert trigger…

### Required prerequisites - [X] I have read the documentation . - [X] I have searched the [Issue Tracker](https://github.com/PKU-Alignment/safe-rlhf/issues) and [Discussions](https://github.com/P…

HaixHan updated 1 year ago
23
microsoft/DeepSpeedExamples #280

DeepSpeed-Chat step1 SFT evaluation error: size mismatch

Hi, I tried to reproduce the whole process on a 8xV100 server with following command: ```bash python train.py --actor-model facebook/opt-13b --reward-model facebook/opt-350m --num-gpus 8 ``` Af…

M1n9X updated 1 year ago
2

上一页 1...46 47 48 49 50 51 52...64 下一页

639 results for reward-modeling

639 results
for reward-modeling