InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
https://lmdeploy.readthedocs.io/en/latest/
Apache License 2.0
4.6k stars 420 forks source link

[Bug] seems a dead loop after loading gemm_config.in #297

Closed sleepwalker2017 closed 11 months ago

sleepwalker2017 commented 1 year ago

Checklist

Describe the bug

I install lmdeploy using pip.

After run this command on V100 using vicuna-13B-v1.5 python profile_generation.py workspace --input_seqlen 512 --output_seqlen 192 --test_round 1

before tuning, it runs ok. After tuning, it seems to goto a dead loop with the cpu usage 100%. image

After I replace the library using the lib I compiled by myself, the problem is solved.

lmdeploy/lmdeploy/build# cp lib/_turbomind.cpython-38-x86_64-linux-gnu.so /opt/conda/lib/python3.8/site-packages/lmdeploy/lib/_turbomind.cpython-38-x86_64-linux-gnu.so

Reproduction

python profile_generation.py workspace --input_seqlen 512 --output_seqlen 192 --test_round 1

Error traceback

![image](https://github.com/InternLM/lmdeploy/assets/26128514/8b38c1b8-964e-4e6a-8b81-3c2c9502dae4)
AllentDan commented 1 year ago

After I replace the library using the lib I compiled by myself, the problem is solved.

Is there anything different in the lib you compiled?

sleepwalker2017 commented 1 year ago

I'm not quite clear, maybe I just add the -DSM=70, maybe nothing.

It's werid.

AllentDan commented 1 year ago

Can you reproduce the error again?

sleepwalker2017 commented 1 year ago

Yes. It can be reproduced by installing lmdploy via pip. You run it ok on your machine?

AllentDan commented 1 year ago

Of course. I did not do tuning.

sleepwalker2017 commented 1 year ago

I paste my gemm_config.in? You can try put it in your workding directory.

batch_size, seq_len, head_num, size_per_head dataType ### batchCount, n, m, k, algoId, customOption, tile, numSplitsK, swizzle, reductionScheme, workspaceSize, stages, exec_time
1 1 40 128 1 ### 1 15360 1 5120 113 -1 -1 -1 -1 -1 -1 -1 0.202213
1 1 40 128 1 ### 1 5120 1 5120 110 -1 -1 -1 -1 -1 -1 -1 0.079022
1 1 40 128 1 ### 1 13824 1 5120 113 -1 -1 -1 -1 -1 -1 -1 0.188743
1 1 40 128 1 ### 1 5120 1 13824 110 -1 -1 -1 -1 -1 -1 -1 0.189207
1 1 40 128 1 ### 1 7680 1 5120 110 -1 -1 -1 -1 -1 -1 -1 0.114368
1 1 40 128 1 ### 1 5120 1 2560 21 0 11 1 0 0 0 14 0.046746
1 1 40 128 1 ### 1 6912 1 5120 108 -1 -1 -1 -1 -1 -1 -1 0.103753
1 1 40 128 1 ### 1 5120 1 6912 110 -1 -1 -1 -1 -1 -1 -1 0.102635
AllentDan commented 1 year ago

Where was the file put? In my comprehension, the vicuna weights were changed after tuning. Right?

sleepwalker2017 commented 1 year ago

Where was the file put? In my comprehension, the vicuna weights were changed after tuning. Right?

This file is a dict from input shape to cublas gemm algorithm. It only decides the computing method for gemms, and has no affect with the weights.

Put it on your current working directory.

When you don't have this file, lmdeploy give a warning: lmdeploy/log:1:[WARNING] gemm_config.in is not found; using default GEMM algo

when you put this file, the warning won't appear.

sleepwalker2017 commented 1 year ago

I need to clarify that: tuning here doesn't mean finetune with the model. It mean tune the cublas gemm algorithms.

AllentDan commented 1 year ago

I used vicuna-7b-v1.3 since I did not get vicuna-13B-v1.5. lmdeploy/turbomind/chat.py works fine for me. So did python profile_generation.py workspace --input_seqlen 512 --output_seqlen 192 --test_round 1. Did you try other models?

AllentDan commented 1 year ago

Reproduced the error. Seems the prebuilt package lib can not handle the gemm_config.in you provided.

@irexyc Could you have a look?

irexyc commented 1 year ago

@sleepwalker2017

What is your cuda version when you build lmdeploy?

The prebuild package are build with cuda 11.8. There are some difference when parse/write the gemm_config.in for different cuda version. https://github.com/InternLM/lmdeploy/blob/main/src/turbomind/utils/cublasAlgoMap.cc#L60-L66 https://github.com/InternLM/lmdeploy/blob/main/src/turbomind/utils/gemm_test/gpt_gemm_func.cc#L497-L526

we will add the llama_gemm to wheel package in next release https://github.com/InternLM/lmdeploy/pull/320

sleepwalker2017 commented 1 year ago

My cuda version is 11.7. when the llama_gemm and the package lib are compiled in the same environment, the problem is solved, is that right?

AllentDan commented 1 year ago

Sounds reasonable.