[BUG/Help] <title>chatglm-6b-int4, ptuning之后推理, 从chat换成generate后得到的输出为空

Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

chat正常有response, 但generate结果打印出来 token_id只比输入多了一个 5, 解码后为空 chat单句推理正常，改成generate进行单句推理/批量推理结果都为空

inputs = self._tokenizer(text_list, padding=True, return_tensors="pt") inputs = inputs.to(self._model.device) outputs = self._model.generate( **inputs, max_length=512, do_sample=False)

两种decode都为空 llm_outputs = list() for j, output in enumerate(outputs.tolist()): index = len(inputs["input_ids"][j]) output1 = output[index:] response = self._tokenizer.decode(output1, skip_special_tokens=True) llm_outputs.append(response)

llm_outputs2 = self._tokenizer.batch_decode(outputs)

THUDM / ChatGLM-6B

[BUG/Help] <title>chatglm-6b-int4, ptuning之后推理, 从chat换成generate后得到的输出为空 #1452

Is there an existing issue for this?

Current Behavior