Closed borzunov closed 10 months ago
Unfortunately, running inference of models with "ptune" in config.tuning_mode was broken after #464:
"ptune" in config.tuning_mode
>>> inputs = tokenizer("A quick brown fox", return_tensors="pt")["input_ids"].cuda() >>> outputs = model.generate(inputs, max_new_tokens=7) Sep 04 07:31:37.766 [INFO] Route found: 0:60 via …NK5GM4 --------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) <ipython-input-3-5d669f0ad493> in <cell line: 2>() 1 inputs = tokenizer("A quick brown fox", return_tensors="pt")["input_ids"].cuda() ----> 2 outputs = model.generate(inputs, max_new_tokens=7) 3 print("generated:", tokenizer.decode(outputs[0])) 7 frames /usr/local/lib/python3.10/dist-packages/petals/models/falcon/model.py in forward(self, input_ids, past_key_values, attention_mask, head_mask, inputs_embeds, use_cache, output_attentions, output_hidden_states, return_dict) 100 # Add last hidden state 101 hidden_states = self.ln_f(hidden_states) --> 102 hidden_states = hidden_states.view(output_shape) 103 return BaseModelOutputWithPastAndCrossAttentions( 104 last_hidden_state=hidden_states, RuntimeError: shape '[1, 1, 8192]' is invalid for input of size 0
Unfortunately, running inference of models with
"ptune" in config.tuning_mode
was broken after #464: