reward-modeling Search Results

639 results
for reward-modeling

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

larsiusprime/georgism #3

First and incomplete link dump

I will place suggested tags/categories both on top of and beside the links. -- Misc Land Value Tax and Farming Parts 1, 2, and 3 https://www.youtube.com/channel/UCw2WENjbuO_C_9cXkLU1iKg (vid…

CountBla updated 2 years ago
1
hiyouga/LLaMA-Factory #2177

报错INFO - llmtuner.model.utils - Failed to load pytorch_model…

### Reminder - [X] I have read the README and searched the existing issues. ### Reproduction OUTPUT= OUTPUT_PATH LR=1e-6 mkdir -p $OUTPUT CUDA_VISIBLE_DEVICES='3' python src/train_bash.py \ …

Eugene-Zh updated 10 months ago
2
polgaria/taiga #1

he is in

geniiii updated 4 years ago
69
huggingface/trl #993

Naive Parallelism Multi-GPU example script fails

Hi, I tried to adapt this example script to use device='auto', to support a larger model. https://github.com/huggingface/trl/blob/main/examples/scripts/ppo_multi_adapter.py Unfortunately it fails on…

johncookds updated 12 months ago
1
hiyouga/LLaMA-Factory #499

百川13b-base PPO 爆内存

查找了相关issues更新了代码等都没解决。 **环境：** python3.10 cuda：11.8 transformers 4.31.0.dev0 torch 2.0.1+cu118 accelerate 0.21.0.dev0 peft …

xuanxuanzl updated 9 months ago
6
argilla-io/argilla #3560

[DOCS] create a tutorial on using `TRL` and `RLHF` (via rewa…

## Which page or section is this issue related to? It might a nice use case to be able to use the `ArgillaTrainer()` for `PPO` and `trl` for showcasing how we might use the `FeedbackDataset` to g…

davidberenstein1957 updated 1 year ago
1
shibing624/MedicalGPT #127

AttributeError: 'BaiChuanConfig' object has no attribute 'nu…

Traceback (most recent call last): File "reward_modeling.py", line 649, in main() File "reward_modeling.py", line 397, in main model = model_class.from_pretrained( File "/home/zyn/…

qingjiaozyn updated 1 year ago
1
hiyouga/LLaMA-Factory #1916

I trained ppo with last version but get the warning

You are using an old version of the checkpointing format that is deprecated (We will also silently ignore `gradient_checkpointing_kwargs` in case you passed it).Please update to the new format on you…

csytxwd updated 10 months ago
4
ashishjamarkattel/reinforment-learning-with-human-feedback #2

Issue regarding rlhf implementation

from random import choices from tqdm import tqdm import time import numpy as np import ast for epoch in range(1): for batch in tqdm(ppo_trainer.dataloader): (logs, game_data,) = (…

Pooja-1410 updated 10 months ago
13
meta-introspector/meta-meme #197

Eigenstatements

Source https://github.com/meta-introspector/meta-meme/wiki/Ode-to-heideigger#ode-to-heideigger ### Summary of Our Path 1. **Initial Concepts and Inspiration**: - We began by invoking the Mu…

jmikedupont2 updated 3 months ago
5

上一页 1...30 31 32 33 34 35 36...64 下一页

639 results for reward-modeling

639 results
for reward-modeling