reward-modeling Search Results

639 results
for reward-modeling

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

KhronosGroup/glTF #1051

glTF roadmap - what would you like to see next in glTF?

Hi all - please chime in with any and all feedback to help drive the direction of glTF beyond 2.0. Even simple +1/-1's for topics are appreciated. How much should we focus on building out the soft…

pjcozzi updated 3 months ago
214
hiyouga/LLaMA-Factory #828

关于Reward Modeling阶段的Loss计算

目前在```src/llmtuner/tuner/rm/trainer.py```中```compute_loss```这一部分采用的loss计算是源自InstructGPT中的提出的 $Loss=-log(\sigma (r_{\theta }(x,y_c)-r_{\theta }(x, y_r)))$ 。但在LLaMA2的论文中提到了可以在loss中添加一个m项的方式标定不同的偏好差别，原文如…

Naming-isDifficult updated 1 year ago
1
huggingface/transformers #28360

Pythia (GPTNeoXForCausalLM) Regression (inference time) in t…

### System Info - `transformers` version: 4.35.0 - Platform: Linux-5.16.19-76051619-generic-x86_64-with-glibc2.35 - Python version: 3.10.11 - Huggingface_hub version: 0.17.3 - Safetensors versi…

JonasGeiping updated 9 months ago
1
fly51fly/aicoco #3

爱可可老师24小时热门分享

微博内容精选

fly51fly updated 3 weeks ago
1906
martin2250/OpenCNCPilot #28

General discussion

as the title states. please only use this thread for questions and discussion and open new feature requests for actual issues with OpenCNCPilot. Martin

martin2250 updated 2 years ago
262
All-Hands-AI/OpenHands #1

Feature Outline and Requirements Engineering

Took a crack at what I think this thing should do (with ChatGPT of course). ## Ideal Scope and Capabilities ### 1. Task Understanding - **Natural Language Processing (NLP)**: The AI must exc…

yourbuddyconner updated 7 months ago
8
huggingface/trl #1195

RewardTrainer fails with FSDP

I've just run into an odd issue with FSDP & RewardTrainer. It seems then when using FSDP, the output of the (sequence classification) model's `forward` function isn't as expected. Normally, it retur…

mgerstgrasser updated 10 months ago
1
shibing624/MedicalGPT #271

Error(s) in loading state_dict for {}:\n\t{}'.format( Runtim…

Hello ,I get an error at reward_modeling(adapter_model) marge base model. error message: File "/home/airuser/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2152, in load…

waycup7 updated 11 months ago
1
h2oai/h2ogpt #1353

minimum requirement for running h2ogpt docker : CUDA out of …

Q1) any minimum requirement for running h2ogpt docker ? should GPU have at least N GB ? - got " torch.cuda.OutOfMemoryError: CUDA out of memory." - at now , using GeForce RTX…

ctxwing updated 9 months ago
6
hiyouga/LLaMA-Factory #1696

PPO全参训练加载奖励模型时缺少文件

### Reminder - [X] I have read the README and searched the existing issues. ### Reproduction 训练脚本： --stage ppo \ --do_train \ --cutoff_len 1024 \ --template default \ --model_n…

hannlp updated 11 months ago
1

上一页 1...29 30 31 32 33 34 35...64 下一页

639 results for reward-modeling

639 results
for reward-modeling