reward-models Search Results

1000+ results
for reward-models

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

OptimalScale/LMFlow #861

[BUG] The text cannot be generated successfully during the R…

**Describe the bug** When I use the fine-tuned LLAMA3 model to run the `examples/raft_align.py` script, I encountered the following error: ``` Traceback (most recent call last): File "/home/work…

biaoliu-kiritsugu updated 2 months ago
1
QwenLM/Qwen2 #909

关于奖励模型reward model

如下图： ![image](https://github.com/user-attachments/assets/bda650bf-28fe-4018-bee9-c98d517123da) 在modelscope和huggingface上没有找到rw model，是没有放出来么？

RyanOvO updated 17 hours ago
2
isk03276/LearnToMoveUR3 #3

QMutex: destroying locked mutex

"Hello, when I use the command to run the pre-trained model you provided, the following issue occurs. Could you please tell me the reason for this? python main.py --env-id reach --load-from pretrai…

deavn2236 updated 8 months ago
1
moves-rwth/storm #498

symbolic bisimulation changes results

Hello everyone, i am currently working on branch 1.8.0 (due to compatibility with stormpy) and trying to solve timed reachability properties for markov automata. I have encountered a difference …

temunds updated 6 months ago
4
CarperAI/trlx #581

Multi-GPU training errors with peft

### 🐛 Describe the bug When I try to use multi-gpu training with accelerate I get an error. Code: ``` import trlx from peft import LoraConfig, TaskType from trlx.data.configs import ( Mod…

AliengirlLiv updated 1 month ago
1
drbenvincent/delay-discounting-analysis #147

new discount function: Exponential-Power

My version V = reward * exp(-a*(delay^b)) This is very similar to the discount function proposed by Ebert & Prelec (2007): V = reward * exp(-(a*delay)^b)) - [x] add to unit tests …

drbenvincent updated 7 years ago
2
microsoft/malmo #848

Content is not allowed in prolog

Hi. I'm new to Malmo, so... I receive this error when trying to use MalmoEnv with the default_minecraft.xml file. XML File: ` Everyday Minecraft life: survival …

reinforcedagent updated 4 years ago
3
bstee615/rarl #6

Evaluate RARL

Now that RARL is implemented, perform evaluations on the same scale as the original paper to demonstrate the same results. - [x] Train 1 model, evaluate cumulative reward across 100 random seeds …

bstee615 updated 3 years ago
3
ray-project/ray #45655

[RLlib] Unable to replicate original PPO performance

### What happened + What you expected to happen I can’t seem to replicate the original [PPO](https://arxiv.org/pdf/1707.06347) algorithm's performance when using RLlib's PPO implementation. The hyp…

rajfly updated 3 months ago
1
Farama-Foundation/Gymnasium #652

[Proposal] Inclusion of ActionRepeat wrapper

### Proposal I would like to propose `ActionRepeat` wrapper that would allow the wrapped environment to repeat `step()` for the specified number of times. ### Motivation I am working on imple…

smmislam updated 5 months ago
2

上一页 1...8 9 10 11 12 13 14...100 下一页

1000+ results for reward-models

1000+ results
for reward-models