-
This will include specifying models such as gating and reward #67, removing forceful actions (downloading big objects, registering wallets, connecting to network).
This is a moving target so make a…
-
### 🐛 Describe the bug
Hi, I'm trying to use `ilql` training on custom data with `flan-t5-large` and `flan-t5-xl` models to fine-tune them using RLHF and `gpt-j-6B` as a reward model.
1. I have …
-
Paper here: https://arxiv.org/pdf/1705.05363.pdf
Discussion:
This is the first paper I have seen that could be a viable way to use curiosity in a game as large as Minecraft. The modules are alread…
-
**What problem or use case are you trying to solve?**
Sometimes models fail to do their job correctly, and we would benefit from starting all over from the beginning. There are a few examples of th…
-
# Description
Currently we are supporting the following datasets:
- [Stanford Human Preferences Dataset (SHP)](https://huggingface.co/datasets/stanfordnlp/SHP)
- [Anthropic RLHF](https://huggingf…
-
- [blog post] Reinforcement learning with prediction based rewards
- [link](https://blog.openai.com/reinforcement-learning-with-prediction-based-rewards/)
- [notes](https://github.com/xysun/…
xysun updated
5 years ago
-
https://deepmind.google/discover/blog/alphageometry-an-olympiad-level-ai-system-for-geometry/
https://manifestai.com/blogposts/faster-after-all/
https://www.theverge.com/2024/1/18/24042354/mark-zu…
-
https://github.com/chromiecraft/chromiecraft/issues/6087
### What client do you play on?
enUS
### Faction
Alliance
### Content Phase:
Generic
### Current Behaviour
it seems bes…
-
# MAHA AI HUB Decentralized AI Marketplace within Telegram Mini Apps
## Submitter Information
- **Name:** Shlok Jagtap, **Email:** shlokjagtap.0608@gmail.com
## Reward Address
0xaE25777C98EE8a…
-
**Describe the bug**
When I use the fine-tuned LLAMA3 model to run the `examples/raft_align.py` script, I encountered the following error:
```
Traceback (most recent call last):
File "/home/work…