reward-modeling Search Results

609 results
for reward-modeling

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

OptimalScale/LMFlow #377

`RuntimeError: CUDA error: device-side assert triggered ` wh…

Hi, I also met the same issue when I `bash run_finetune_with_lora.sh` with the `LLAMA-7b`. The following is my script and log: ``` #!/bin/bash # Please run this script under ${project…

Ancientshi updated 1 year ago
6
CarperAI/trlx #488

SFT wrong for Anthropic-HH, leading to poor model quality?

### 🐛 Describe the bug Please correct me if I'm wrong, but it looks like SFT for Anthropic simply maximizes log p(x) on the entire dialogue history, rather than only maximizing log p(y|x), where x is…

eric-mitchell updated 1 year ago
2
h2oai/h2ogpt #513

IndexError: list index out of range

Dear Rob, After succesfully Loading checkpoint shards: 100%, my computer return this: Traceback (most recent call last): File "C:\Users\erwinnella\Desktop\h2ogpt\generate.py", line 16, in …

erwinrnasution updated 1 year ago
4
microsoft/DeepSpeedExamples #704

[BUG]deepspeed-chat training error on v100 * 8, raise assert…

**Describe the bug** Hi, everybody, I'm traning a llama model in step3 using deepspeed-chat. In version 0.10.1, it raised the following error([see in logs bleow](https://github.com/microsoft/DeepSp…

iamsile updated 10 months ago
3
filecoin-project/devgrants #1050

Claim Resolution for dMRV : impact certificates.

## **Overview** dMeter aims to revolutionize the way decentralized Measurement, Reporting, and Verification (dMRV) protocols operate in the field of regenerative action and ecosystem regeneration. …

Timothy-AthenaProtocol updated 7 months ago
2
erfanzar/EasyDeL #15

Trouble running demo

Hello, thanks for this really cool repository. I'm recently learning about `pjit` and your repo is a valuable reference resource. I was having some issues running the example code. In particular, i…

vwxyzjn updated 9 months ago
18
microsoft/DeepSpeedExamples #491

need a better metrics other than acc and average score for t…

I've recently discovered that reward modeling plays a crucial role in the third step of PPO training. In my previous reward model, the score would increase with more "thanks" tokens, regardless of t…

DanqingZ updated 1 year ago
6
huggingface/trl #507

Stack-LLaMa: OSError: Can't load tokenizer when run reward

after run supervised_finetuning.py: I got: ``` (gh_trl) amd00@MZ32-00:~/llm_dev/trl$ ll llama-se/final_checkpoint/ total 32828 drwxrwxr-x 2 amd00 amd00 4096 7月 9 11:13 ./ drwxrwxr-x 3 am…

SeekPoint updated 1 year ago
6
rte-france/Grid2Op #481

For 1.9.2

- [ ] scoring class for alert (based on AlertReward) - [ ] ~~scoring class for counting alert~~ - [x] add the possibility to return the number of "accurate simulator" used in the scoring (see the ex…

BDonnot updated 1 year ago
4
PKU-Alignment/safe-rlhf #98

[Question] score_model training support for baichuan model

### Required prerequisites - [X] I have read the documentation . - [X] I have searched the [Issue Tracker](https://github.com/PKU-Alignment/safe-rlhf/issues) and [Discussions](https://github.com/PKU-…

skepsun updated 1 year ago
2

上一页 1...38 39 40 41 42 43 44...61 下一页

609 results for reward-modeling

609 results
for reward-modeling