-
This bounty is a debate bounty (our very first one).
The 2019 token bear market is going to weed our a lot of token models that *are not working*.
I would like to hear people's predictions for …
-
I was wondering if there was ever discussion on using the URL agents in a package. For example, I'm working in an environment with discrete actions spaces, so I need a different training script, but w…
-
Hi, I am trying to use your code. However, I noticed this repo is not a compeleted version, as training data is missing.
Is there a more detailed dcoumentation how to use your code? And if you can…
hxue3 updated
10 months ago
-
I'm taking part in an AWS-run community time trial race at work. I am only using the DeepRacer console, no custom SageMaker or ros anything.
I'm getting this error, and I have no idea why - is it …
-
### Proposal
With the release of the `MuJoCo-v5` environments, in Gymnasium 1.0.0 (which will be coming out, prior to the heat death of the universe).
We need tutorials on:
- [x] loading a q…
-
Why reward model use `mean(values[:,:-1], dim=1)` as output?
```python
values = self.value_head(last_hidden_states)[:, :-1]
value = values.mean(dim=1).squeeze(1) # ensure shape is (B)
```
http…
-
![图片](https://github.com/usail-hkust/LLMTSCS/assets/56549016/e104b604-e95f-4340-928d-4081c24cc20b)
The `neg_detach` and `boundary` are not preset, how should i set them True or False ?
-
Hello, I want to run train_ppo_llama_ray.sh on 4 RTX4090, should I modify the actor_num_gpus_per_node/critic_num_gpus_per_node in train_ppo_llama_ray.sh ? As the default script is for 8 gpus, what el…
-
(temp) C:\Users\IM-LP-1453\exposure>python evaluate.py example pretrained models/sample_inputs/*.tif
Traceback (most recent call last):
File "evaluate.py", line 4, in
from net import GAN
…
-
I'm frustrated when I am trying to implement PPO using deepspeed, which needs to run actor, critic and reward model at the same time.
It seems that deepspeed cannot support running multiple models…