Closed sleepwalker2017 closed 11 months ago
After I replace the library using the lib I compiled by myself, the problem is solved.
Is there anything different in the lib you compiled?
I'm not quite clear, maybe I just add the -DSM=70, maybe nothing.
It's werid.
Can you reproduce the error again?
Yes. It can be reproduced by installing lmdploy via pip. You run it ok on your machine?
Of course. I did not do tuning
.
I paste my gemm_config.in? You can try put it in your workding directory.
batch_size, seq_len, head_num, size_per_head dataType ### batchCount, n, m, k, algoId, customOption, tile, numSplitsK, swizzle, reductionScheme, workspaceSize, stages, exec_time
1 1 40 128 1 ### 1 15360 1 5120 113 -1 -1 -1 -1 -1 -1 -1 0.202213
1 1 40 128 1 ### 1 5120 1 5120 110 -1 -1 -1 -1 -1 -1 -1 0.079022
1 1 40 128 1 ### 1 13824 1 5120 113 -1 -1 -1 -1 -1 -1 -1 0.188743
1 1 40 128 1 ### 1 5120 1 13824 110 -1 -1 -1 -1 -1 -1 -1 0.189207
1 1 40 128 1 ### 1 7680 1 5120 110 -1 -1 -1 -1 -1 -1 -1 0.114368
1 1 40 128 1 ### 1 5120 1 2560 21 0 11 1 0 0 0 14 0.046746
1 1 40 128 1 ### 1 6912 1 5120 108 -1 -1 -1 -1 -1 -1 -1 0.103753
1 1 40 128 1 ### 1 5120 1 6912 110 -1 -1 -1 -1 -1 -1 -1 0.102635
Where was the file put? In my comprehension, the vicuna weights were changed after tuning. Right?
Where was the file put? In my comprehension, the vicuna weights were changed after tuning. Right?
This file is a dict from input shape to cublas gemm algorithm. It only decides the computing method for gemms, and has no affect with the weights.
Put it on your current working directory.
When you don't have this file, lmdeploy give a warning:
lmdeploy/log:1:[WARNING] gemm_config.in is not found; using default GEMM algo
when you put this file, the warning won't appear.
I need to clarify that: tuning here doesn't mean finetune with the model. It mean tune the cublas gemm algorithms.
I used vicuna-7b-v1.3 since I did not get vicuna-13B-v1.5. lmdeploy/turbomind/chat.py
works fine for me. So did python profile_generation.py workspace --input_seqlen 512 --output_seqlen 192 --test_round 1
. Did you try other models?
Reproduced the error. Seems the prebuilt package lib can not handle the gemm_config.in
you provided.
@irexyc Could you have a look?
@sleepwalker2017
What is your cuda version when you build lmdeploy?
The prebuild package are build with cuda 11.8. There are some difference when parse/write the gemm_config.in
for different cuda version.
https://github.com/InternLM/lmdeploy/blob/main/src/turbomind/utils/cublasAlgoMap.cc#L60-L66
https://github.com/InternLM/lmdeploy/blob/main/src/turbomind/utils/gemm_test/gpt_gemm_func.cc#L497-L526
we will add the llama_gemm
to wheel package in next release https://github.com/InternLM/lmdeploy/pull/320
My cuda version is 11.7. when the llama_gemm and the package lib are compiled in the same environment, the problem is solved, is that right?
Sounds reasonable.
Checklist
Describe the bug
I install lmdeploy using pip.
After run this command on V100 using vicuna-13B-v1.5
python profile_generation.py workspace --input_seqlen 512 --output_seqlen 192 --test_round 1
before tuning, it runs ok. After tuning, it seems to goto a dead loop with the cpu usage 100%.
After I replace the library using the lib I compiled by myself, the problem is solved.
Reproduction
python profile_generation.py workspace --input_seqlen 512 --output_seqlen 192 --test_round 1
Error traceback