reward-models Search Results

1000+ results
for reward-models

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

synapse-alpha/mirror-neuron #69

Compile a list of requirements to make bittensor neuron clas…

This will include specifying models such as gating and reward #67, removing forceful actions (downloading big objects, registering wallets, connecting to network). This is a moving target so make a…

steffencruz updated 6 months ago
2
CarperAI/trlx #383

Flant-t5-large Deepspeed OVERFLOW! issues + bad outputs aft…

### 🐛 Describe the bug Hi, I'm trying to use `ilql` training on custom data with `flan-t5-large` and `flan-t5-xl` models to fine-tune them using RLHF and `gpt-j-6B` as a reward model. 1. I have …

chainyo updated 1 year ago
8
thomashopkins32/Minecraft-Virtual-Intelligence #8

Map out plan to use ICM as the curiosity module

Paper here: https://arxiv.org/pdf/1705.05363.pdf Discussion: This is the first paper I have seen that could be a viable way to use curiosity in a game as large as Minecraft. The modules are alread…

thomashopkins32 updated 1 month ago
4
All-Hands-AI/OpenHands #2221

[Feature]: Retry on failure functionality

**What problem or use case are you trying to solve?** Sometimes models fail to do their job correctly, and we would benefit from starting all over from the beginning. There are a few examples of th…

neubig updated 2 weeks ago
2
nebuly-ai/optimate #224

[Chatllama] Use upvotes in Stanford dataset as a measure for…

# Description Currently we are supporting the following datasets: - [Stanford Human Preferences Dataset (SHP)](https://huggingface.co/datasets/stanfordnlp/SHP) - [Anthropic RLHF](https://huggingf…

diegofiori updated 1 year ago
8
xysun/blog #7

Paper readings Nov 2018 [3]

- [blog post] Reinforcement learning with prediction based rewards - [link](https://blog.openai.com/reinforcement-learning-with-prediction-based-rewards/) - [notes](https://github.com/xysun/…

xysun updated 5 years ago
2
junhwi/next-gen-ai #9

24/01/24

https://deepmind.google/discover/blog/alphageometry-an-olympiad-level-ai-system-for-geometry/ https://manifestai.com/blogposts/faster-after-all/ https://www.theverge.com/2024/1/18/24042354/mark-zu…

junhwi updated 7 months ago
3
azerothcore/azerothcore-wotlk #17118

[Hunter Pet] Rip-Blade Ravager doesn't shuffle between it's …

https://github.com/chromiecraft/chromiecraft/issues/6087 ### What client do you play on? enUS ### Faction Alliance ### Content Phase: Generic ### Current Behaviour it seems bes…

Annamaria-CC updated 2 months ago
4
spheronFdn/sos-ai-bounty #8

MAHA AI HUB Decentralized AI Marketplace within Telegram Min…

# MAHA AI HUB Decentralized AI Marketplace within Telegram Mini Apps ## Submitter Information - **Name:** Shlok Jagtap, **Email:** shlokjagtap.0608@gmail.com ## Reward Address 0xaE25777C98EE8a…

DeImOs-Sj updated 4 days ago
2
OptimalScale/LMFlow #861

[BUG] The text cannot be generated successfully during the R…

**Describe the bug** When I use the fine-tuned LLAMA3 model to run the `examples/raft_align.py` script, I encountered the following error: ``` Traceback (most recent call last): File "/home/work…

biaoliu-kiritsugu updated 2 months ago
1

上一页 1...5 6 7 8 9 10 11...100 下一页

1000+ results for reward-models

1000+ results
for reward-models