-
### 🐛 Describe the bug
Hi,
There is something that is slightly unclear to me in the **summarize_rlhf** code -
I see that the tokenizer used everywhere is the pretrained tokenizer of `EleutherAI/gpt…
-
Create an example showing reward modeling. This could use a synthetic reward source artificially limited, or the HHH Anthropic data (already on the Stability cluster).
More ideas for tasks: https://…
-
Hi,
I recently am working on a psychological project on estimating the model parameters using numpyro MCMC inference. However, I've found no tutorials within the numpyro documentation to guide me.…
-
Hello, Antoxnxpod! I saw you liked my repositories. Do you want to create a new project together?
-
### 🐛 Describe the bug
0%| | 0/10000 [00:00
-
This istn something i think thats in dire need, i just think it would be dope.
I imagine it that u can select ur party composition and a Boss Template and the AI gives u a mitigation plane. It doe…
-
# Abstract
The entire project is composed of two milestones (milestone 1 and 2) and will take place in 12 months. It will consist of a scientific paper called: “An analysis of the two tokens NEO ec…
-
-
The organization I work for has a well curated metadata catalog of datasets with a queryable autocomplete service. I work on a team that supports our Machine Learning teams and a common feature re…
-
I want my reward function to depend on the prompt used. Mainly, I want to fine-tune an LM for a conditional generation task e.g., summarization. It seems that the reward function expects only a list o…