rlhf Search Results - Githubissues

1000+ results
for rlhf

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

OpenLMLab/MOSS-RLHF #32

资源占用问题

请问在RLHF过程中，actor，refrence，critic和reward使用的都是7B吗，使用offload了吗，我用的4张80G卡，使用offload的情况下，加载完模型就占用60g了，batch size=4，显存就占满了

Ming-Di updated 9 months ago
3
HarderThenHarder/transformers_tasks #19

咨询一下RLHF算法的细节

通常PPO算法需要收集一个episode的数据，计算整个episode的DiscountedReturn/Advantage/GAE，用来更新Critic 在情感分析或者对话任务中，一个episode是什么？

WilfChen updated 1 year ago
2
NVIDIA/NeMo-Aligner #59

Add the SFT tutorial

**Is your feature request related to a problem? Please describe.** We should include a tutorial for the SFT. Although we have SteerLM, including a SFT tutorial is important because it is the simple…

shengyangs updated 6 months ago
7
conceptofmind/PaLM #9

RuntimeError: No available kernel. Aborting execution.

When I run the inference logic using the following script, I get `RuntimeError: No available kernel. Aborting execution.` error: ``` A100 GPU detected, using flash attention if input tensor is on…

zarandioon updated 1 year ago
1
HumanSignal/label-studio #3927

Ranker tag not displayed

**Describe the bug** When I following the instruction in https://labelstud.io/tags/ranker.html to create a ranker tag, I found it not be displayed in the interface. **To Reproduce** Steps to repr…

Homura1212 updated 1 year ago
2
microsoft/DeepSpeedExamples #458

Adding two loss from actor will lead to an error " gradient …

When training the ppo model, I turned on the gradient_checkpointing_enable. If you want to calculate ptx loss, then actor will forward twice. In your code, these two loss are executed backward once se…

piekey1994 updated 3 months ago
4
CarperAI/trlx #54

Add Jax support

### 🚀 The feature, motivation, and pitch Add jax support for RLHF on TPUs. ### Alternatives _No response_ ### Additional context _No response_

Dahoas updated 1 year ago
3
CarperAI/trlx #410

!deepspeed examples/summarize_rlhf/sft/train_gptj_summarize.…

**Bug** Hello , I am trying to run summarize_rlhf example using [this blog on wandb](https://wandb.ai/carperai/summarize_RLHF/reports/Implementing-RLHF-Learning-to-Summarize-with-trlX--VmlldzozMzA…

MyBruso updated 10 months ago
8
hpcaitech/ColossalAI #3595

torch.distributed.elastic.multiprocessing.api:failed (exitco…

### 🐛 Describe the bug GPU: 8*A6000 CUDA Version: 11.7 Python Version: 3.8.10 colossalai Version: 0.2.8 when I train PPO by ``` torchrun --standalone --nproc_per_node=8 train_prompts.py \ …

ifromeast updated 6 months ago
19
THUDM/ImageReward #26

How to use HuggingFace Data?

Hi, @xujz18 @Xiao9905 Thanks for this nice contribution. I noticed that we can load ImageReward data with: `datasets.load_dataset("THUDM/ImageRewardDB", "8k")` However, the loaded data seem to…

liming-ai updated 8 months ago
11

上一页 1...12 13 14 15 16 17 18...100 下一页

1000+ results for rlhf

1000+ results
for rlhf