amanb2000 / Magic_Words

Code for the paper "What's the Magic Word? A Control Theory of LLM Prompting"
MIT License
90 stars 12 forks source link

Track equivalent prompts in greedy_forward_reachability() #8

Open amanb2000 opened 3 months ago

amanb2000 commented 3 months ago

https://github.com/amanb2000/Magic_Words/blob/c2a36171be05937a7127a4861030b1ded40f6ea9/magic_words/greedy_forward.py#L274

Y_to_U[y]['equivalent_u'] = [[u_1], [u_2], ...] satisfy y = argmax P(y' | u_i + x_0) for all u_i.