reward-models Search Results

1000+ results
for reward-models

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

pytorch/torchtune #1249

Does torchtune support for traditional text-generation tasks…

I wonder whether torchtune can support traditional tasks such as translation or more general text generation tasks which have a input and output column. I have read the datasets doc at [here](https://…

sherlcok314159 updated 3 weeks ago
21
facebookresearch/mbrl-lib #115

[Feature Request] More general reward

Hi, currently reward_fn is independent from environment class (mbrl.models.ModelEnv) and accepts as input actions and next observation. In practice more general, dependent on environment parameters re…

mkolodziejczyk-piap updated 3 years ago
1
hpcaitech/ColossalAI #3566

[FEATURE]: Graphic card ram friendly PPO training for big mo…

### Describe the feature The PPO training needs to maintain 4 models in memory at the same time. The original implementation keep the reward/actor critic/initial model in video ram at the same time. …

yynil updated 1 year ago
1
RoboSats/robosats #304

PRO: Toolbar component

*This is an excerpt of main thread for RoboSats PRO* https://github.com/Reckless-Satoshi/robosats/issues/177#issuecomment-1289175371 ### Toolbar for RoboSats PRO A simple component with a few but…

Reckless-Satoshi updated 1 month ago
15
TrevorAshby/CodeRLHF #4

Model List

Select a series of models to be used in the project. They will be fine-tuned, architecturally manipulated (i.e., replacing the last layer for reward model), and RLHF will be performed on all models.

TrevorAshby updated 11 months ago
2
huggingface/lerobot #341

question: expected performance of vq-bet?

Hi, Thank you to the LeRobot community for maintaining such a fantastic codebase. My research group and I have greatly benefited from your efforts. In my current project, I am using the repository …

Jubayer-Hamid updated 1 month ago
5
ali-vilab/VGen #126

Missing reward.reward_webvid file

Hello, I followed the steps outlined in "InstructVideo (CVPR 2024)." I'm trying to run the evaluation step: bash configs/instructvideo/eval_generate_videos.sh but I encounter the error below. I checke…

Benjamin-So updated 3 months ago
1
alfworld/alfworld #87

Following expert actions doesn't always lead to finishing th…

Dear authors, Thanks for the amazing work. Recently I followed the expert actions that I extract from `get_info()` function from the class `AlfredThorEnv`, however, the success rate is only slight…

XiaofengLin7 updated 5 days ago
1
duyminh1998/pycmo #52

Create custom rewards handler

# Why #### As a user of `pyCMO` #### I want to be able to specify different reward models for my scenarios #### So that I can train RL agents # Acceptance Criteria #### Given we currently only expo…

duyminh1998 updated 9 months ago
1
NVIDIA/NeMo-Aligner #137

cannot load reward model from SFT model because of missing k…

I converted a llama model to nemo, with model dirs like below: ![image](https://github.com/NVIDIA/NeMo-Aligner/assets/6756880/2d36915a-a0ab-4c1a-8d20-0960a7948bdc) When I tried to load it to train a…

DZ9 updated 1 month ago
10

上一页 1...3 4 5 6 7 8 9...100 下一页

1000+ results for reward-models

1000+ results
for reward-models