-
### Describe the bug
![image](https://github.com/user-attachments/assets/6675452a-826e-4aa1-9e00-c218065d1279)
As can be seen in the screenshot, the `RL` value shows only `-`. I believe this is due …
-
For KIS tasks it would be good to reward finding the correct video, because often a video contains similar or even equal scenes (for example in news videos, where there is a preview of a scene). Such…
-
## Context
Validator payouts are lazy, and paged. Meaning for each era, and page of nominators (see [MaxExposurePageSize](https://paritytech.github.io/polkadot-sdk/master/pallet_staking/trait.Config.…
-
**Describe the solution you'd like**
Currently, whether or not an effect has a cooldown and the duration of that cooldown are only manageable in the UI, it'd be nice to allow this to be updated via e…
-
After train RM(step1-step3) with steerLM,I'll get reward model(.nemo), is it as the final reward model?
Nemotron-4-340B technical report show the perfermance of reward model based on reward-bench
…
-
I'm trying to use the "OVERVIEW" rewards screen but I'm not able to, it stays on "COMPACT" even though I change it to both "DEFAULT" and "OVERVIEW", and I'm also not able to remove the list of special…
-
While not clear yet, it is likely that killing opponents or laying bombs next to them will rarely happen during normal training. In this case one might need to make use of suitably shaped rewards (tha…
luwo9 updated
3 weeks ago
-
It may be worth to collect some ideas here of what to reward:
obvious should be:
-coins collected (+)
-opponents killed (+)
-winning the game (+)
-getting killed by a bomb(-)
maybe also to thi…
-
### Prerequisites
- [X] I am running the latest code. Mention the version if possible as well.
- [X] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md)…
-
Currently rewards suffer do not work as intended when applied to a range of markets, such as all markets with the same settlement assst.
Users on long tail markets receive almost no rewards compare…