reward-modeling Search Results

609 results
for reward-modeling

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

rust-embedded/wg #101

[WIP] pre-RFC: Better management of SVD to Chip Support Crat…

- Feature Name: Central Management of SVD to CSP process - Start Date: 2018-05-xx - RFC PR: - Rust Issue: # Summary [summary]: #summary I need to make this a paragraph, but my main points a…

jamesmunns updated 3 months ago
25
shibing624/MedicalGPT #122

在用 chatglm2-6b 构建奖励模型时候报错 ValueError: Found array with dim 3…

1. transformers 版本 4.31.0 最新版本； 2. 已经修改的地方有 ``` --- a/reward_modeling.py +++ b/reward_modeling.py @@ -34,6 +34,7 @@ from transformers import ( Trainer, TrainingArguments, set_s…

diaojunxian updated 1 year ago
3
shibing624/MedicalGPT #73

使用deepspeed run run_rm.sh

Could not estimate the number of tokens of the input, floating-point operations will not be computed Traceback (most recent call last): File "/root/nas-share/chat/MedicalGPT-main/reward_modeling.p…

yangzhipeng1108 updated 1 year ago
4
CarperAI/trlx #564

`position_ids` error in accelerate PPO trainer

### 🐛 Describe the bug Hi, I'm very new to TRLX, PEFT, and Huggingface, so I'm not sure if I just have some simple configuration wrong, but I am trying to recreate the notebook [here](https://cola…

pbarragan updated 11 months ago
3
shibing624/MedicalGPT #35

Stage 3: Reward Modeling 报错：**ValueError: weight is on the m…

### Describe the Question 按照run_training_pipeline.ipynb的步骤执行， Stage1,Stage2都执行OK，执行到第三阶段：RM(Reward Model)奖励模型建模时，报错，请帮忙解决。错误：**ValueError: weight is on the meta device, we need a `value` to put …

dage0127 updated 1 year ago
4
Tribler/tribler #5313

literature survey + master thesis: G-Rank learn-to-rank

Direction changed, txt will be updated soon. Old stuff: - 1997: [The Internet: A Future Tragedy of the Commons?](https://link.springer.com/chapter/10.1007/978-1-4757-2644-2_22) - [Internet Securi…

synctext updated 3 weeks ago
183
hiyouga/LLaMA-Factory #188

Baichuan13B在PPO时，loss为nan。完成训练后，模型回答为乱码

![image](https://github.com/hiyouga/LLaMA-Efficient-Tuning/assets/26586964/b36befe3-0e2b-4954-b14d-8af7dc221b16) ![image](https://github.com/hiyouga/LLaMA-Efficient-Tuning/assets/26586964/9bddca35-ae…

LEOMessi6 updated 1 year ago
5
huggingface/trl #356

[StackLLaMA] Problems running reward_modeling.py using gpt2 …

Hello! I am trying to get the ``reward_modeling.py`` file to work on a smaller scale by using gpt2 as a reward model. The only changes I made to the file from its current version in the repo w…

samuelhoglund updated 1 year ago
4
shibing624/MedicalGPT #144

训练奖励模型的时候，如果训练百川模型，请问model_type 传什么参数？我传的 baichuan 这个报错了。请问怎…

### Describe the bug Please provide a clear and concise description of what the bug is. If applicable, add screenshots to help explain your problem, especially for visualization related problems.

DaiJitao updated 1 year ago
3
huggingface/trl #653

In the reward trainer, `rewards_chosen` and `rewards_rejecte…

This might be a dumb question, but I am having trouble following how the readme example matches how reward modeling is described in the latest papers I've read on it. From the readme, the example …

Santosh-Gupta updated 1 year ago
4

上一页 1...33 34 35 36 37 38 39...61 下一页

609 results for reward-modeling

609 results
for reward-modeling