Abnormal stop during text completion

estherche113 commented 4 months ago

System Info / 系統信息

GLM3 abnormally stopped during text completion as follows: “ …覆盖器官癌症或早期严重疾病的证据。如果在十五天 ”

Who can help? / 谁可以帮助到您？

No response

Information / 问题信息

[ ] The official example scripts / 官方的示例脚本
[X] My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

Following is the code triggering this error:

system_prompt_template = """你是一位知识助手，使用以下知识库和聊天记录来尽力回答问题。

- 知识库提供了以下相关知识：
========
资料来源：cancercareessence-basic-contract-en.pdf
CLAIMS PROCEDURES

(a)  Immediate notice in the event of death of the Insured; or
(b)  Within sixty (60) days after the Diagnosis of Cancer of the Covered Organs or Early Stage Critical Illness,

as the case may be.  Such notice given to us at our Issuing Office, with particulars sufficient to identify the  Insured,
shall be deemed to be notice to us.  Failure to give notice within such time shall not invalidate any claim if it shall be
shown not to have been reasonably possible to give such notice and that notice was given as soon as was reasonably
possible.

2.  Proof of Claim
  We, upon receipt of such notice, will furnish to the Claimant appropriate forms for filing proof of death, Cancer of the

Covered Organs or Early Stage Critical Illness (as the case may be).

If  the  forms  are  not  furnished  within  fifteen  (15)  days,  the  Claimant  shall  be  deemed  to  have  complied  with  the
requirements of this provision if he:

========
资料来源：cancercareessence-basic-contract-en.pdf
GENERAL PROVISIONS

THE CONTRACT

Your  Policy  is  a  legally  enforceable  agreement  between  you  and  us.  This  Policy  comes  into  force  on  the  Issue  Date
provided you have paid the full amount of the first premium and have submitted a signed and dated application.

========
资料来源：cancercareessence-basic-contract-en.pdf
5.  GENERAL TERMS AND CONDITIONS OF BENEFITS

(a)  The  Current  Sum  Assured  of  the  Basic  Policy  will  be  decreased  by  the  amount  of  any  Lump  Sum  Advance

Payment under Clause 2 herein and any Limited Advance Payment under Clause 3 herein.

(b)  Aggregate Limit -- The aggregate of the Lump Sum Advance Payment and the Limited Advance Payment made
hereunder shall not exceed the percentage of the Initial Sum Assured for Lump Sum Advance Payment shown in
the attached SCHEDULE OF BENEFIT.

(c)  The BENEFIT PROVISIONS shall cease to apply upon Reaching of Aggregate Limit.

6.  EXCLUSIONS

Except for the Death Benefit under Clause 1 above, this Basic Policy does not apply to any of the following or any
event which arises from the following:

(a)  any illness other than a Diagnosis of Cancer of the Covered Organs or Early Stage Critical Illness;
(b)  any illness the  signs or symptoms of  which first occurred prior to the Issue Date or the latest Commencement
None

==========
- 你需要回答以下新问题。
新问题：
{}

==========
- 如果找到答案，请以简洁的方式写出答案，使用中文作答
- 不要进行任何计算
- 如果不知道答案或知识库中不包含答案，回答‘抱歉，我不知道’。不要编造任何答案或数字。
-
- 提示，知识库在上文中提供了以下来源：cancercareessence-basic-contract-en.pdf
有帮助的答案：
资料来源："""

test_seed = 1113

prompts = [{
        "input_prompt": system_prompt_template.format("有什麼保險條款"),
        "user_query": "有什麼保險條款",
        "random_seed": test_seed
    }
]

import os
import numpy as np
import random, torch
from datetime import datetime

from typing import Dict, Union, Optional
from torch.nn import Module
from transformers import AutoModel, AutoTokenizer

GLM3_MODEL_PATH = "Insert your GLM3 model path here"#"D:/projectGLM3/ChatGLM3/chatglm3-6b-32k/chatglm3-6b-32k"
GPUS_AVAILABLE = 2

def set_random_seed(seed):
    if seed is None:
        return
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.backends.cudnn.deterministic = True
    if torch.cuda.is_available():
        torch.cuda.manual_seed_all(seed)

def load_glm3_model():
    model = load_model_on_gpus(GLM3_MODEL_PATH, num_gpus=GPUS_AVAILABLE)
    model = model.eval()
    return model

def load_tokenizer():
    tokenizer = AutoTokenizer.from_pretrained(GLM3_MODEL_PATH, trust_remote_code=True)
    return tokenizer

def auto_configure_device_map(num_gpus: int) -> Dict[str, int]:
    num_trans_layers = 28
    per_gpu_layers = 30 / num_gpus

    device_map = {
        'transformer.embedding.word_embeddings': 0,
        'transformer.encoder.final_layernorm': 0,
        'transformer.output_layer': 0,
        'transformer.rotary_pos_emb': 0,
        'lm_head': 0
    }

    used = 2
    gpu_target = 0
    for i in range(num_trans_layers):
        if used >= per_gpu_layers:
            gpu_target += 1
            used = 0
        assert gpu_target < num_gpus
        device_map[f'transformer.encoder.layers.{i}'] = gpu_target
        used += 1

    return device_map

def load_model_on_gpus(checkpoint_path: Union[str, os.PathLike], num_gpus: int = 2,
                       device_map: Optional[Dict[str, int]] = None, **kwargs) -> Module:
    if num_gpus < 2 and device_map is None:
        model = AutoModel.from_pretrained(checkpoint_path, trust_remote_code=True, **kwargs).half().cuda()
    else:
        from accelerate import dispatch_model

        model = AutoModel.from_pretrained(checkpoint_path, trust_remote_code=True, **kwargs).half()

        if device_map is None:
            device_map = auto_configure_device_map(num_gpus)

        model = dispatch_model(model, device_map=device_map)

    return model

model = load_glm3_model()
TOKENIZER = load_tokenizer()
TEMPERATURE = 0.1 #Test

"""Modify this test_indx to try different cases (now only test_indx=1)"""
test_indx_list = [0]

loop_test_num = 1 #100
step = 1

start_time=datetime.now().strftime('%Y%m%d_%H%M%S')
print(start_time)

repetition_penalty = 1.2
max_tokens = 1024
for prompt_indx in test_indx_list:
    for i in range(loop_test_num):
        random_seed = prompts[prompt_indx]['random_seed'] + i*step
        set_random_seed(random_seed)

        input_prompt = prompts[prompt_indx]['input_prompt']
        user_query = prompts[prompt_indx]['user_query']
        system_info = {'role': 'system', 'content': input_prompt}
        reply, history = model.chat(
            tokenizer = TOKENIZER, 
            query = user_query, 
            history = [system_info],
            temperature = TEMPERATURE,
            repetition_penalty=repetition_penalty,
            max_new_tokens=max_tokens
        )
        is_potential_sudden_stop = not reply.endswith((".","。","?","？","!","！",")","）"))
        result_dict = {
            "Potential Sudden Stop": is_potential_sudden_stop if is_potential_sudden_stop else None,
            "Question":user_query,
            "Seed":random_seed,
            "Reply":reply,
            }
        print(f"=================Result  {i+1}=================")
        print("\n".join([str(item) for item in result_dict.items()]))

The model output abnormally stopped at “如果在十五天”:

=================Result 1================= ('Potential Sudden Stop', True) ('Question', '有什麼保險條款') ('Seed', 1113) ('Reply', '根据您提供的信息，我为您总结了癌症护理本质基本合同（cancer care essence basic contract）中的保险条款如下：\n\n1. 在被保险人去世时立即通知保险公司；或者在被诊断出覆盖器官的癌症或在早期阶段出现严重疾病后的六十天内，向我们的发行办公室提供详细情况以识别被保险人的通知。如果在此时间内未给出此类通知，除非证明在合理情况下不可能在规定的时间内给予该通知且已尽快给予通知，否则不会使任何索赔无效。\n\n2. 当收到上述通知后，保险公司将向申请人提供适当的表格以提交死亡证明、覆盖器官癌症或早期严重疾病的证据。如果在十五天')

Expected behavior / 期待表现

This error occasionally happens and is hard to catch before seeing the whole output. Does anyone know the reason and how to prevent such abnormal stops from happening?

zRzRzRzRzRzRzR commented 4 months ago

查看你是不是设置了输出上限？ 1024你改成8192呢

estherche113 commented 4 months ago

查看你是不是设置了输出上限？ 1024你改成8192呢

并不是因为输出上限，这条输出长度大约在200 tokens以内，远没有到1024 tokens。测试设置成8192输出也一样

zRzRzRzRzRzRzR commented 3 months ago

但是输入太多了

THUDM / ChatGLM3