logits_prob 输出 - Githubissues

qingkongzhiqian commented 1 year ago

### 模型解码过程中输出每个token的概率值

Solutions

modeling_chatglm.py stream_generate function

            # sample
            probs = nn.functional.softmax(next_token_scores, dim=-1)

            if generation_config.do_sample:
                next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
            else:
                next_tokens = torch.argmax(probs, dim=-1)

            # update generated ids, model inputs, and length for next step
            input_ids = torch.cat([input_ids, next_tokens[:, None]], dim=-1)
            model_kwargs = self._update_model_kwargs_for_generation(
                outputs, model_kwargs, is_encoder_decoder=self.config.is_encoder_decoder
            )
            unfinished_sequences = unfinished_sequences.mul((sum(next_tokens != i for i in eos_token_id)).long())

            # stop when each sentence is finished, or if we exceed the maximum length
            if unfinished_sequences.max() == 0 or stopping_criteria(input_ids, scores):
                break
            yield input_ids,probs[0][next_tokens]

pjw80921 commented 1 year ago

Could you post the code that how to call this function to get the logits probs ?

Does generate() method can get the logits probs as well?

xuzf-git commented 1 year ago

ChatGLMForConditionalGeneration中调用的generate()函数继承自transformers.generation_utils.GenerationMixin.generate，这个函数有两个参数return_dict_in_generate，output_scores，默认为False将其设置为True就可以输出logit。

@torch.inference_mode()
def chat(self, tokenizer, query: str, history: List[Tuple[str, str]] = None, max_length: int = 8192, num_beams=1, do_sample=True, top_p=0.8, temperature=0.8, logits_processor=None, **kwargs):
    if history is None:
        history = []
    if logits_processor is None:
        logits_processor = LogitsProcessorList()
    logits_processor.append(InvalidScoreLogitsProcessor())
    gen_kwargs = {
        "max_length": max_length,
        "num_beams": num_beams,
        "do_sample": do_sample,
        "top_p": top_p,
        "temperature": temperature,
        "logits_processor": logits_processor,
        "return_dict_in_generate": True,    # 新增
        "output_scores": True,    # 新增
        **kwargs,
    }
    inputs = self.build_inputs(tokenizer, query, history=history)
    outputs = self.generate(**inputs, **gen_kwargs)
    gen_ids = outputs.sequences.tolist()[0][len(inputs["input_ids"][0]) :]
    response = tokenizer.decode(gen_ids)
    # scores是Tuple，只包括新生成的token的logits 形状为new_token_num * vocab_size
    scores = outputs.scores
    response = self.process_response(response)
    history = history + [(query, response)]
    return response, history, gen_ids, scores

pjw80921 commented 1 year ago

ChatGLMForConditionalGeneration中调用的generate()函数继承自transformers.generation_utils.GenerationMixin.generate，这个函数有两个参数return_dict_in_generate，output_scores，默认为False将其设置为True就可以输出logit。

@torch.inference_mode()
def chat(self, tokenizer, query: str, history: List[Tuple[str, str]] = None, max_length: int = 8192, num_beams=1, do_sample=True, top_p=0.8, temperature=0.8, logits_processor=None, **kwargs):
    if history is None:
        history = []
    if logits_processor is None:
        logits_processor = LogitsProcessorList()
    logits_processor.append(InvalidScoreLogitsProcessor())
    gen_kwargs = {
        "max_length": max_length,
        "num_beams": num_beams,
        "do_sample": do_sample,
        "top_p": top_p,
        "temperature": temperature,
        "logits_processor": logits_processor,
        "return_dict_in_generate": True,    # 新增
        "output_scores": True,    # 新增
        **kwargs,
    }
    inputs = self.build_inputs(tokenizer, query, history=history)
    outputs = self.generate(**inputs, **gen_kwargs)
    gen_ids = outputs.sequences.tolist()[0][len(inputs["input_ids"][0]) :]
    response = tokenizer.decode(gen_ids)
    # scores是Tuple，只包括新生成的token的logits 形状为new_token_num * vocab_size
    scores = outputs.scores
    response = self.process_response(response)
    history = history + [(query, response)]
    return response, history, gen_ids, scores

谢谢大佬，这里面的new_token_num是否就是指的是gen_ids？

xuzf-git commented 1 year ago

ChatGLMForConditionalGeneration中调用的generate()函数继承自transformers.generation_utils.GenerationMixin.generate，这个函数有两个参数return_dict_in_generate，output_scores，默认为False将其设置为True就可以输出logit。

@torch.inference_mode()
def chat(self, tokenizer, query: str, history: List[Tuple[str, str]] = None, max_length: int = 8192, num_beams=1, do_sample=True, top_p=0.8, temperature=0.8, logits_processor=None, **kwargs):
    if history is None:
        history = []
    if logits_processor is None:
        logits_processor = LogitsProcessorList()
    logits_processor.append(InvalidScoreLogitsProcessor())
    gen_kwargs = {
        "max_length": max_length,
        "num_beams": num_beams,
        "do_sample": do_sample,
        "top_p": top_p,
        "temperature": temperature,
        "logits_processor": logits_processor,
        "return_dict_in_generate": True,    # 新增
        "output_scores": True,    # 新增
        **kwargs,
    }
    inputs = self.build_inputs(tokenizer, query, history=history)
    outputs = self.generate(**inputs, **gen_kwargs)
    gen_ids = outputs.sequences.tolist()[0][len(inputs["input_ids"][0]) :]
    response = tokenizer.decode(gen_ids)
    # scores是Tuple，只包括新生成的token的logits 形状为new_token_num * vocab_size
    scores = outputs.scores
    response = self.process_response(response)
    history = history + [(query, response)]
    return response, history, gen_ids, scores

谢谢大佬，这里面的new_token_num是否就是指的是gen_ids？

是的

jackiey99 commented 1 year ago

@xuzf-git 大佬有看过chatglm qwen baichuan 这类大模型输出的时候每个token的概率吗？我发现大部分token的概率是1，不知道是不是这样

xuzf-git commented 1 year ago

@xuzf-git 大佬有看过chatglm qwen baichuan 这类大模型输出的时候每个token的概率吗？我发现大部分token的概率是1，不知道是不是这样

是这样的，确定性很强，不知道是不是因为解码策略的影响

datalee commented 1 year ago

这score没有参考意义

pjw80921 commented 1 year ago

这score没有参考意义

如果你输出token=1，这不就是判断是否正确的score吗

xuzf-git commented 1 year ago

这score没有参考意义

如果不在词表上做归一化，这个score应该可以作为分类任务的参考

1028686314 commented 1 year ago

我这样输出的scores有很多都是-inf诶

xuzf-git commented 1 year ago

我这样输出的scores有很多都是-inf诶

每个token的位置对应一个长度为vocab_size的logit向量，向量的每一维表示词表中每个token 在该位置出现的logit。因为绝大部分词典中的词都不应该出现，所以绝大部分都是-inf。

1028686314 commented 1 year ago

谢谢大佬解答，那如果整句话的话怎么基于输出的scores计算置信度呢？这个就和词的数量有关系了。

xuzf-git commented 1 year ago

谢谢大佬解答，那如果整句话的话怎么基于输出的scores计算置信度呢？这个就和词的数量有关系了。

如果按照条件语言模型建模，应该是每个token在词表上使用softmax归一化后得到条件概率，然后累乘起来。

1028686314 commented 1 year ago

这样的话，句子token数量越多，条件概率会倾向于越小呀。感觉也有点奇怪的。还是不能很好的衡量置信度

jackiey99 commented 1 year ago

可以取所有token的min，具体做法可以参考一下A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation；和Active Retrieval Augmented Generation

不过大模型大多数token的概率都是1，当然也有可能因为我的温度设置了0.1的缘故。。。盲目自信的模型

1028686314 commented 1 year ago

感谢大佬们解答

THUDM / ChatGLM-6B

logits_prob 输出 #1257

Solutions