bigscience-workshop / petals

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
https://petals.dev
MIT License
8.89k stars 489 forks source link

Fix prompt tuning after #464 #501

Closed borzunov closed 10 months ago

borzunov commented 10 months ago

Unfortunately, running inference of models with "ptune" in config.tuning_mode was broken after #464:

>>> inputs = tokenizer("A quick brown fox", return_tensors="pt")["input_ids"].cuda()
>>> outputs = model.generate(inputs, max_new_tokens=7)

Sep 04 07:31:37.766 [INFO] Route found: 0:60 via …NK5GM4
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-3-5d669f0ad493> in <cell line: 2>()
      1 inputs = tokenizer("A quick brown fox", return_tensors="pt")["input_ids"].cuda()
----> 2 outputs = model.generate(inputs, max_new_tokens=7)
      3 print("generated:", tokenizer.decode(outputs[0]))

7 frames
/usr/local/lib/python3.10/dist-packages/petals/models/falcon/model.py in forward(self, input_ids, past_key_values, attention_mask, head_mask, inputs_embeds, use_cache, output_attentions, output_hidden_states, return_dict)
    100         # Add last hidden state
    101         hidden_states = self.ln_f(hidden_states)
--> 102         hidden_states = hidden_states.view(output_shape)
    103         return BaseModelOutputWithPastAndCrossAttentions(
    104             last_hidden_state=hidden_states,

RuntimeError: shape '[1, 1, 8192]' is invalid for input of size 0