reward-models Search Results

1000+ results
for reward-models

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

OptimalScale/LMFlow #861

[BUG] The text cannot be generated successfully during the R…

**Describe the bug** When I use the fine-tuned LLAMA3 model to run the `examples/raft_align.py` script, I encountered the following error: ``` Traceback (most recent call last): File "/home/work…

biaoliu-kiritsugu updated 3 weeks ago
1
Aligner2024/aligner #6

question about the paper

Hi @Aligner2024 , May I know how to calculate the harmlessness and helpfulness score in Figure 2? And noticed you changed the equation (2), may I know the reason? And the code here only cover t…

Ruibn updated 1 week ago
1
thomashopkins32/Minecraft-Virtual-Intelligence #8

Map out plan to use ICM as the curiosity module

Paper here: https://arxiv.org/pdf/1705.05363.pdf Discussion: This is the first paper I have seen that could be a viable way to use curiosity in a game as large as Minecraft. The modules are alread…

thomashopkins32 updated 1 week ago
4
huggingface/accelerate #2496

[RFC] Supporting multiple models with DeepSpeed

### System Info ```Shell PyTorch 2.2.1 DeepSpeed 0.13.4 ``` ### Information - [ ] The official example scripts - [ ] My own modified scripts ### Tasks - [ ] One of the scripts in the examples/ …

pacman100 updated 1 month ago
9
pytorch/torchtune #1017

[RFC] TransformerDecoder refactor

# `TransformerDecoder` Refactor **Authors:** * @SalmanMohammadi with input from: * @kartikayk * @ebsmothers * @pbontrager ## **Summary** Refactoring `TransformerDecoder` to offer additi…

SalmanMohammadi updated 2 weeks ago
9
OpenRLHF/OpenRLHF #211

vllm +zero2 hangs

Team, thank you so much for this wonderful toolkit! we are trying to test the vllm setting with mistralai/Mistral-7B-Instruct-v0.2 model with zero2 ![image](https://github.com/OpenLLMAI/OpenRLHF/a…

karthik19967829 updated 2 months ago
32
duyminh1998/pycmo #52

Create custom rewards handler

# Why #### As a user of `pyCMO` #### I want to be able to specify different reward models for my scenarios #### So that I can train RL agents # Acceptance Criteria #### Given we currently only expo…

duyminh1998 updated 7 months ago
1
suessmann/intelligent_traffic_lights #11

What is Reward?

Hello. First of all, thank you for your public source. I have a question. I wonder what Reward means in this model. Usually, the queue is Reward for other models, but I think this model is differe…

MinHongSun updated 1 year ago
1
GFNOrg/gfn-lm-tuning #5

Question about the learning objective mentioned in paper

"The generative process is the same as in auto-regressive language models: generation begins with an empty string, and at the 𝑖-th step a token 𝑧𝑖 is sampled" Since the generative process is conduc…

StarDewXXX updated 3 months ago
1
lucidrains/PaLM-rlhf-pytorch #4

Unified reward function/model architecture for a wide range …

I find the reward function to be the most important part of RLHF, because it is the part which mimics a human evaluator, providing instant feedback to the model. However, due to ChatGPT's wide rang…

James4Ever0 updated 1 year ago
2

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for reward-models

1000+ results
for reward-models