reward-modeling Search Results

639 results
for reward-modeling

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

OpenRLHF/OpenRLHF #159

baichuan2-13b-base作为actor RuntimeError: CUDA error: device-s…

首先非常感谢这个库，可以让多个13b的模型进行rlhf并起到不错的效果。问题：在 https://huggingface.co/baichuan-inc/Baichuan2-13B-Base进行内部多轮数据sft后的对齐模型 + openrlhf 提供的 rm-anthropic_hh-lmsys-oasst-webgpt.pt 出现异常如下 (CriticModelRayActo…

netrookiecn updated 10 months ago
5
huggingface/trl #861

Please provide a clear and actionable parameter directly.

These are sample commands in the documentation ``` torchrun --nnodes 1 --nproc_per_node 8 examples/stack_llama/scripts/supervised_finetuning.py --model_path= --streaming --no_gradient_checkpointi…

lansinuote updated 11 months ago
2
Agoric/agoric-sdk #5872

mTree Economic Modeling Bounty

Agreement Contract ## Description mTree is an agent-based modeling tool for building and testing microeconomic systems. The bounty will build an mTree capability to use the existing mTree model to w…

jeetraut updated 10 months ago
1
valueflows/valueflows #30

Old Link Dumps

Hat tip @simontegg http://www.wsj.com/articles/the-future-of-the-internet-is-flow-1443796858 I maybe 10% believe it, I believe some of the comments more. But it certainly fits the theme here...

bhaugen updated 1 month ago
296
huggingface/transformers #24643

"RuntimeError: 'weight' must be 2-D" training with DeepSpeed

### System Info - `transformers` version: 4.30.2 - Platform: Linux-5.19.0-46-generic-x86_64-with-glibc2.35 - Python version: 3.10.11 - Huggingface_hub version: 0.15.1 - Safetensors version: 0.3…

ZizoAdam updated 7 months ago
19
hiyouga/LLaMA-Factory #1646

咨询：freeze模式下，支持PPO调试么

作者大佬您好，想请教您一下：freeze模式下，支持PPO的调试么？我在用freeze模式调完rm以后，跟着PPO的时候，系统提示：只有Lora才能支持PPO的调试。

camposs1979 updated 11 months ago
14
bitsandbytes-foundation/bitsandbytes #606

python -m bitsandbytes - UDA Setup failed despite GPU being …

I was installing LLM Studio on Ubuntu 22.04 using the same steps I did for 20.04, except that I followed the below steps to install the install the nvidia driver, and I am experiencing the below err…

joelvargasapo updated 10 months ago
6
hiyouga/LLaMA-Factory #1236

请问基于llama2微调自己的模型流程。

请问基于llama2微调自己的模型流程也是把下面五个步骤都先后做一遍吗？我只是在llama2基础上增加我自己的训练数据微调，不需要全量训练。谢谢。 Pre-Training Supervised Fine-Tuning Reward Modeling PPO Training DPO Training

yinjiaoyuan updated 1 year ago
1
microsoft/LMOps #130

the logits between MP=1 and MP=4 is different when control a…

When I run evaluate.py for llama, I find that the logits of llama's generation code (model.generate(xxx)) are different when I use different settings: eval_main_dolly.sh and eval_main_dolly_mp4.sh. Th…

SleepEarlyLiveLong updated 10 months ago
9
PKU-Alignment/safe-rlhf #141

ppo训练模型出错[BUG]

### Required prerequisites - [X] I have read the documentation . - [X] I have searched the [Issue Tracker](https://github.com/PKU-Alignment/safe-rlhf/issues) and [Discussions](https://github.com/PKU-…

fzwqq updated 1 year ago
2

上一页 1...31 32 33 34 35 36 37...64 下一页

639 results for reward-modeling

639 results
for reward-modeling