Example Script - Githubissues

mbadikyan commented 10 months ago

Hello!

Just finished reading your paper, and I'm excited to put your code to use. Would you provide a small example script for how to run the code?

Thank you!

amanb2000 commented 10 months ago

Here's an example script in scripts/backoff_hack_demo.py!

The top-level comment has most of the details -- it does what we did in the paper, so it runs greedy search for k=1,2,3 tokens then does greedy coordinate gradient for k=4,5,...,10 tokens. It checks if the prompt satisfies the argmax condition for the answer after each search run. You can use the function backoff_hack_qa_ids() directly or just change up the question and answer on lines 113-114 in backoff_hack_demo.py as needed. Note that the answer must be only 1 token (hard to compute argmax over more than 1 token 😅).

Let me know if you have any other questions!

mbadikyan commented 10 months ago

Getting this memory error:

Traceback (most recent call last): File "/home/mbadikyan/Desktop/screaming_fist/llm_agents/Magic_Words/scripts/backoff_hack.py", line 246, in return_dict = backoff_hack_qa_ids(question_ids, answer_ids, model, tokenizer) File "/home/mbadikyan/Desktop/screaming_fist/llm_agents/Magic_Words/scripts/backoff_hack.py", line 110, in backoff_hack_qa_ids newprompt, = greedy_prompt_hack_qa_ids(pq, answer_ids, File "/home/mbadikyan/Desktop/screaming_fist/llm_agents/Magic_Words/magic_words/prompt_hack_qa.py", line 77, in greedy_prompt_hack_qa_ids message_scores = batch_compute_score(message_ids, File "/home/mbadikyan/envs/venv_llm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "/home/mbadikyan/Desktop/screaming_fist/llm_agents/Magic_Words/magic_words/batch_compute_score.py", line 58, in batch_compute_score msg_scores, avg_loss = compute_score(messageids[start:end, :], File "/home/mbadikyan/Desktop/screaming_fist/llm_agents/Magic_Words/magic_words/compute_score.py", line 53, in compute_score output = model(full_ids, labels=full_ids) File "/home/mbadikyan/envs/venv_llm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/mbadikyan/envs/venv_llm/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(args, kwargs) File "/home/mbadikyan/.cache/huggingface/modules/transformers_modules/tiiuae/falcon-7b/898df1396f35e447d5fe44e0a3ccaaaa69f30d36/modeling_falcon.py", line 900, in forward transformer_outputs = self.transformer( File "/home/mbadikyan/envs/venv_llm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "/home/mbadikyan/.cache/huggingface/modules/transformers_modules/tiiuae/falcon-7b/898df1396f35e447d5fe44e0a3ccaaaa69f30d36/modeling_falcon.py", line 797, in forward outputs = block( File "/home/mbadikyan/envs/venv_llm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/mbadikyan/envs/venv_llm/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(args, kwargs) File "/home/mbadikyan/.cache/huggingface/modules/transformers_modules/tiiuae/falcon-7b/898df1396f35e447d5fe44e0a3ccaaaa69f30d36/modeling_falcon.py", line 477, in forward mlp_output = self.mlp(mlp_layernorm_out) File "/home/mbadikyan/envs/venv_llm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "/home/mbadikyan/envs/venv_llm/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, *kwargs) File "/home/mbadikyan/.cache/huggingface/modules/transformers_modules/tiiuae/falcon-7b/898df1396f35e447d5fe44e0a3ccaaaa69f30d36/modeling_falcon.py", line 410, in forward x = self.dense_4h_to_h(x) File "/home/mbadikyan/envs/venv_llm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/home/mbadikyan/envs/venv_llm/lib/python3.10/site-packages/accelerate/hooks.py", line 160, in new_forward args, kwargs = module._hf_hook.pre_forward(module, *args, **kwargs) File "/home/mbadikyan/envs/venv_llm/lib/python3.10/site-packages/accelerate/hooks.py", line 286, in pre_forward set_module_tensor_to_device( File "/home/mbadikyan/envs/venv_llm/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 313, in set_module_tensor_to_device new_value = value.to(device) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 158.00 MiB (GPU 0; 3.81 GiB total capacity; 2.70 GiB already allocated; 143.25 MiB free; 2.81 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

When I have plenty of memory in my GPU:

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 2771 G /usr/lib/xorg/Xorg 44MiB | | 0 N/A N/A 112259 C+G ...823541921611338742,262144 77MiB | +-----------------------------------------------------------------------------+

Any fixes? Thank you.

amanb2000 commented 10 months ago

Main Fix: Falcon-7b -> GPT-2

Looks like you have only ~4GB of GPU memory -- the model in the script is Falcon-7b, which would take 14 GB of GPU memory at 16 bit precision

(7*10^9 parameters) * (16 bits / param) * (1 GB / (8*10^9 bits)) = 14 GB

This doesn't include memory for activations, so I've changed the script to work with a smaller model like GPT-2. You can run the demo script with that model now with

# using commit 605ec28 or later:
>>> python3 scripts/backoff_hack.py --model gpt-2-small --seed 42

That should produce the following final result:

Decoded Optimal prompt (u):   42 NCTinyl
Optimal prompt length (tokens, |u|):  4
Prompt loss:  1.6640625
Prompt is correct!

THEREFORE: `42` = argmax_a P(a | ` 42 NCTinyl` + `What is the meaning of life? `)

Let me know if that works + you get the same output 🙂 It might still be tight with GPU memory -- peak usage on my machine was 3900 MB, which is close to your maximum.

Other Improvements

You should also consider taking the advice of the out-of-memory error:

If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

So try

>>> export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:<value>

Replacing the <value> with the amount of GPU memory you want to allocate (probably the vast majority of the GPU, especially if you're not running a GUI).

mbadikyan commented 10 months ago

That worked, thanks for the fix.

Here is my output:

Decoded Optimal prompt (u):   41SAY Mesa
Optimal prompt length (tokens, |u|):  4
Prompt loss:  0.1534423828125
Prompt is correct!

THEREFORE: "42" = argmax_a P(a | "41SAY Mesa" + "What is the meaning of life? ")

Suppose I wanted to use an LLM hosted on a remote machine that I could prompt using python requests like so:

output = requests.post(LLM_URI, prompt_and_input)

What aspects of the LLM would I need to still be able to find magic words for a given input/output pair?

amanb2000 commented 9 months ago

We use two methods to find magic words in the paper: for short prompts (1-3 tokens) we use our own greedy "back-generation" search algorithm, and for longer prompts (4+ tokens) we use greedy coordinate gradient.

Greedy back generation requires measuring the loss on our desired final token -- i.e., compute the probability over next tokens logits = P(next_token | prompt_u + imposed_x) and then loss = CE_loss(logits, desired_token_y). Importantly, this loss is ONLY the loss on the final token, which I haven't seen too many APIs offer.

Greedy coordinate gradient requires the loss computation like above AND backpropagating that same loss through the model to the embedding layer. The code for that is in magic_words/easy_gcg.py. Unless you're the one building the API, that computation is probably pretty out-of-scope for generic inference APIs.

Hope that helps! If you want to build a server for HuggingFace models that does this, let me know -- happy to help out with some starter code for my own FastAPI inference server I built that exposes a compute_loss() function that lets you do the logits = P(next_token | prompt_u + imposed_x) and then loss = CE_loss(logits, desired_token_y) thing. Backpropagating the gradients would be an extra step, but the code here should be quite applicable.

mbadikyan commented 9 months ago

I appreciate all the time you put into your responses in this exchange. I will reach out if my team and I have something of interest (or if I get confused again :upside_down_face: ).

amanb2000 / Magic_Words

Example Script #1

Main Fix: Falcon-7b -> GPT-2

Other Improvements