reward-models Search Results

1000+ results
for reward-models

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

FEMBusinessModelsRing/web3_revenue_primitives #26

What systems are "working" as of Q2 2019?

This bounty is a debate bounty (our very first one). The 2019 token bear market is going to weed our a lot of token models that *are not working*. I would like to hear people's predictions for …

owocki updated 5 years ago
42
rll-research/url_benchmark #25

Use URL as a Package?

I was wondering if there was ever discussion on using the URL agents in a package. For example, I'm working in an environment with discrete actions spaces, so I need a different training script, but w…

natolambert updated 2 years ago
1
adf1178/QueCos_code #1

Detailed documentation and data

Hi, I am trying to use your code. However, I noticed this repo is not a compeleted version, as training data is missing. Is there a more detailed dcoumentation how to use your code? And if you can…

hxue3 updated 10 months ago
5
aws-robotics/kinesisvideo-ros1 #91

KenesisVideoFrameTransportCallback Error 4102

I'm taking part in an AWS-run community time trial race at work. I am only using the DeepRacer console, no custom SageMaker or ros anything. I'm getting this error, and I have no idea why - is it …

chorhatarahuduketuri updated 2 years ago
2
Farama-Foundation/Gymnasium #846

[Proposal] Add Tutorials for MuJoCo based environments (cont…

### Proposal With the release of the `MuJoCo-v5` environments, in Gymnasium 1.0.0 (which will be coming out, prior to the heat death of the universe). We need tutorials on: - [x] loading a q…

Kallinteris-Andreas updated 2 months ago
2
hpcaitech/ColossalAI #4475

Question about the output of reward model in RLHF？

Why reward model use `mean(values[:,:-1], dim=1)` as output？ ```python values = self.value_head(last_hidden_states)[:, :-1] value = values.mean(dim=1).squeeze(1) # ensure shape is (B) ``` http…

gauss-clb updated 1 year ago
7
usail-hkust/LLMTSCS #17

Missing augments in aft_rank_loss_utils.py

![图片](https://github.com/usail-hkust/LLMTSCS/assets/56549016/e104b604-e95f-4340-928d-4081c24cc20b) The `neg_detach` and `boundary` are not preset, how should i set them True or False ?

DA21S321D updated 4 months ago
5
OpenRLHF/OpenRLHF #269

The configuration for Llama-7b on 4 RTX4090

Hello, I want to run train_ppo_llama_ray.sh on 4 RTX4090, should I modify the actor_num_gpus_per_node/critic_num_gpus_per_node in train_ppo_llama_ray.sh ? As the default script is for 8 gpus, what el…

LinkyLiu updated 4 months ago
5
yuanming-hu/exposure #56

Error while importing Util

(temp) C:\Users\IM-LP-1453\exposure>python evaluate.py example pretrained models/sample_inputs/*.tif Traceback (most recent call last): File "evaluate.py", line 4, in from net import GAN …

saikrishna232 updated 2 years ago
3
microsoft/DeepSpeed #3093

[REQUEST] Support multiple models using deepspeed

I'm frustrated when I am trying to implement PPO using deepspeed, which needs to run actor, critic and reward model at the same time. It seems that deepspeed cannot support running multiple models…

zhzou2020 updated 1 year ago
1

上一页 1...11 12 13 14 15 16 17...100 下一页

1000+ results for reward-models

1000+ results
for reward-models