【Issue Help】 chatglm2-6b has some cases dismatch with golden

DeepTecher commented 2 months ago

https://github.com/bytedance/ByteMLPerf/blob/main/byte_infer_perf/llm_perf/workloads/chatglm2-torch-fp16-6b.json

We run on A100-40G to get output logits with the below configuration：

{
    "model": "chatglm2-torch-fp16-6b",
    "test_accuracy": true,
    "test_perf": true,
    "min_new_tokens": 128,
    "max_new_tokens": 256,
    "tp_sizes": [1, 2],
    "batch_sizes":[1, 2, 4, 8],
    "input_tokens": [1024, 2048],
    "dataset": "llm_perf/datasets/merged_52_test.csv",
    "perf_time": 180
}

It seems that some dimensions do not match the golden values. one case of 52 cases:

id,question,A,B,C,D
0,"对于以下结构定义，++p->str中的++加在____
struct{
int len;
char*str;
}*P;",指针 p 上,指针 str 上,str 指的内容上,语法错误

suisiyuan commented 2 months ago

to be comfirmed.

suisiyuan commented 1 month ago

previous golden values didn't contain eos_token_id, and might stop generating if generated tokens num exceeds 512. current golden values will contain eos_token_id, and will still stop generating if generated tokens num exceeds 512.

bytedance / ByteMLPerf

【Issue Help】 chatglm2-6b has some cases dismatch with golden #77