Stochastic GCG for optimizing prompts on datasets

amanb2000 / Magic_Words

Code for the paper "What's the Magic Word? A Control Theory of LLM Prompting"

MIT License

93 stars 13 forks source link

Let's generalize the [easy_gcg]() code to optimize prompts on a dataset of (x, y) pairs, where each x is the question and y is the answer.

We want to solve u := argmax_u E [P(y | u + x)] where the expectation is taken over the dataset (x, y) ~ D.

We can start by simply aggregating gradients for the swaps in GCG over multiple elements of the batch (https://github.com/amanb2000/Magic_Words/blob/32840cd867c83fc131205e5ff639a109f4e4f78c/magic_words/easy_gcg.py#L178).

All that remains is to create an efficient batch_compute_score_dataset() function to compute the scores of each potential new prompt w.r.t. the dataset (https://github.com/amanb2000/Magic_Words/blob/32840cd867c83fc131205e5ff639a109f4e4f78c/magic_words/easy_gcg.py#L263)

Todo

[ ] Adjustable eval_batch_size (how many training examples we use to compute the score of each alternative prompt)

[ ] Store + return prompts and losses over the course of optimization (per iteration)

[ ] Clever system for evaluating alternative prompts on small subset of training data, then evaluating promising ones on larger subset (log complexity hopefully, cut down on expensive batch score computation calls over whole dataset for every alternative prompt)

amanb2000 / Magic_Words

Stochastic GCG for optimizing prompts on datasets #6

Todo