Why is CodeGeeX2-6b much slower than ChatGLM2-6b

THUDM / CodeGeeX2

CodeGeeX2: A More Powerful Multilingual Code Generation Model

https://codegeex.cn

Apache License 2.0

7.62k stars 532 forks source link

Why is CodeGeeX2-6b much slower than ChatGLM2-6b #48

Open Wallong opened 1 year ago

Wallong commented 1 year ago

As the title says, on the same graphics card (3090), CodeGeeX2-6b is much slower than ChatGLM-6b. According to the official demo, I would like to know if there are any tricks.

Wallong commented 1 year ago

I find the way to inference in demo/run_demo.py is faster.

inputs = tokenizer([prompt], return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_length=inputs['input_ids'].shape[-1] + request.max_tokens
）

instead of

inputs = tokenizer.encode(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_length=256, top_k=1)

I don't know the reason, but it did work for me, anyone knows the answer?

Stanislas0 commented 1 year ago

As the title says, on the same graphics card (3090), CodeGeeX2-6b is much slower than ChatGLM-6b. According to the official demo, I would like to know if there are any tricks.

It should be the same. The inference time also depends on the length of output tokens.