bytedance / ByteMLPerf

AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and versatility of software and hardware.
https://bytemlperf.ai/
Apache License 2.0
184 stars 47 forks source link

【Issue Help】 chatglm2-6b has some cases dismatch with golden #77

Open DeepTecher opened 2 months ago

DeepTecher commented 2 months ago

https://github.com/bytedance/ByteMLPerf/blob/main/byte_infer_perf/llm_perf/workloads/chatglm2-torch-fp16-6b.json

We run on A100-40G to get output logits with the below configuration:

{
    "model": "chatglm2-torch-fp16-6b",
    "test_accuracy": true,
    "test_perf": true,
    "min_new_tokens": 128,
    "max_new_tokens": 256,
    "tp_sizes": [1, 2],
    "batch_sizes":[1, 2, 4, 8],
    "input_tokens": [1024, 2048],
    "dataset": "llm_perf/datasets/merged_52_test.csv",
    "perf_time": 180
}

It seems that some dimensions do not match the golden values. one case of 52 cases:

id,question,A,B,C,D
0,"对于以下结构定义,++p->str中的++加在____
struct{
int len;
char*str;
}*P;",指针 p 上,指针 str 上,str 指的内容上,语法错误
suisiyuan commented 2 months ago

to be comfirmed.

suisiyuan commented 1 month ago

previous golden values didn't contain eos_token_id, and might stop generating if generated tokens num exceeds 512. current golden values will contain eos_token_id, and will still stop generating if generated tokens num exceeds 512.