-
**Describe the bug**
When I use the fine-tuned LLAMA3 model to run the `examples/raft_align.py` script, I encountered the following error:
```
Traceback (most recent call last):
File "/home/work…
-
如下图:
![image](https://github.com/user-attachments/assets/bda650bf-28fe-4018-bee9-c98d517123da)
在modelscope和huggingface上没有找到rw model,是没有放出来么?
-
"Hello, when I use the command to run the pre-trained model you provided, the following issue occurs. Could you please tell me the reason for this?
python main.py --env-id reach --load-from pretrai…
-
Hello everyone,
i am currently working on branch 1.8.0 (due to compatibility with stormpy) and trying to solve timed reachability properties for markov automata.
I have encountered a difference …
-
### 🐛 Describe the bug
When I try to use multi-gpu training with accelerate I get an error.
Code:
```
import trlx
from peft import LoraConfig, TaskType
from trlx.data.configs import (
Mod…
-
My version
V = reward * exp(-a*(delay^b))
This is very similar to the discount function proposed by Ebert & Prelec (2007):
V = reward * exp(-(a*delay)^b))
- [x] add to unit tests
…
-
Hi. I'm new to Malmo, so... I receive this error when trying to use MalmoEnv with the default_minecraft.xml file.
XML File:
`
Everyday Minecraft life: survival
…
-
Now that RARL is implemented, perform evaluations on the same scale as the original paper to demonstrate the same results.
- [x] Train 1 model, evaluate cumulative reward across 100 random seeds
…
-
### What happened + What you expected to happen
I can’t seem to replicate the original [PPO](https://arxiv.org/pdf/1707.06347) algorithm's performance when using RLlib's PPO implementation. The hyp…
-
### Proposal
I would like to propose `ActionRepeat` wrapper that would allow the wrapped environment to repeat `step()` for the specified number of times.
### Motivation
I am working on imple…