【Bug】Bug Need fix -- dismatch output dimension with gloden

bytedance / ByteMLPerf

AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and versatility of software and hardware.

https://bytemlperf.ai/

Apache License 2.0

188 stars 50 forks source link

【Bug】Bug Need fix -- dismatch output dimension with gloden #76

Closed DeepTecher closed 3 months ago

DeepTecher commented 3 months ago

https://github.com/bytedance/ByteMLPerf/blob/c1d1835738cc17d974197f809fa503ec84e32d4d/byte_infer_perf/llm_perf/backends/GPU/gpu_scheduler.py#L60

Hi, We followed your script to test and found that there is a problem here, which needs to be modified

            # 5. add result to packet
            for i, gen_res in enumerate(generation_results):
                if gen_res.finish_reason:
                    batch[i].finish()
                batch[i].add_result(gen_res)

suisiyuan commented 3 months ago

it is correct here. There are two possible reasons to finish the task: 1. the generated token_id is eos_token_id; 2. generated token_id len exceed max_new_tokens. we always add current token_id to output, and finish task based on finish_reason.

DeepTecher commented 3 months ago

however, when we run on A100, it will raise an error if we do not change to the following modification. Please confirm it on chatglm case.

suisiyuan commented 3 months ago

however, when we run on A100, it will raise an error if we do not change to the following modification. Please confirm it on chatglm case.

chatglm is out-of-date, and it will be removed later. There is a update today, you can try chatglm2 instead.