reward-modeling Search Results

609 results
for reward-modeling

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

hashgraph/guardian #54

Mitigation Credits Research

### Problem description Non-offset GHG reductions, such as those resulting from corporate energy conservation or efficiency initiatives, are currently not supported in Guardian. These are generally me…

danielnorkin updated 6 months ago
7
hiyouga/LLaMA-Factory #1235

单机多卡，模型数据类型为bf16，PPO阶段报错。

启动脚本： output_model=ppo_lora_output if [ ! -d ${output_model} ];then mkdir ${output_model} fi export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch src/train_bash.py \ --stage …

mycutedog updated 11 months ago
1
monke-mob/monke-activities #29

How about we redo lobby, adding a village with side quests?

### Proposal While the current lobby has no issues, it is lacking two things. Engagement, Life, and Space. So why not redo it? We could redo the lobby to include a lot more open space, environment, l…

homiemace updated 11 months ago
5
keean/zenscript #11

Syntax summary

I will maintain in this OP a summary of current proposed syntax as I understand it to be. Note this is not authoritative, subject to change, and it may be inaccurate. Please make comments to discuss. …

shelby3 updated 1 year ago
1191
vwxyzjn/lm-human-preference-details #6

Questions about `left_padding_to_right_padding`

Hi Costa, Thanks for sharing the awesome implementations! It is tremendously helpful for my own work. I noticed a few uses of the function [left_padding_to_right_padding](https://github.com/vwx…

liutianlin0121 updated 1 year ago
4
huggingface/trl #809

PPO on multi-GPU but get Error: Expected all tensors to be o…

I am training alpaca-7B on 4 * A100 80G I am using the provided Deepspeed-zero2 yaml file as the configuration file in the repository and running it. Even when I set the device-map of the model to No…

Ricardokevins updated 9 months ago
20
pyrddlgym-project/pyRDDLGym-symbolic #2

Potential DBN inconsistencies

When I played around with the navigation domain (I appended the `domain.rddl` and `instance.rddl`) and displayed the DBN of single states and ground fluents, I could not interpret the results correctl…

GMMDMDIDEMS updated 7 months ago
10
rust-embedded/wg #101

[WIP] pre-RFC: Better management of SVD to Chip Support Crat…

- Feature Name: Central Management of SVD to CSP process - Start Date: 2018-05-xx - RFC PR: - Rust Issue: # Summary [summary]: #summary I need to make this a paragraph, but my main points a…

jamesmunns updated 3 months ago
25
shibing624/MedicalGPT #166

rl 阶段，运行报错 ValueError: Got unexpected arguments: {'token_typ…

### Describe the bug Please provide a clear and concise description of what the bug is. If applicable, add screenshots to help explain your problem, especially for visualization related problems. …

izhaomeng updated 1 year ago
1
eric-mitchell/direct-preference-optimization #32

why do prob_eval(train)/chosen and rewards_eval(train)/chose…

Have reviewed the wandb training curves provided, and I have a question: why do prob_eval(train)/chosen and rewards_eval(train)/chosen gradually decrease? I originally thought that these two metrics w…

bowuaaa updated 1 year ago
6

上一页 1...32 33 34 35 36 37 38...61 下一页

609 results for reward-modeling

609 results
for reward-modeling