Closed pseudotensor closed 2 months ago
Thanks for reporting the bug.
Please attach the output of lmdeploy check_env
and set environment variable TM_DEBUG_LEVEL=DEBUG
/ TM_LOG_LEVEL=INFO
before trying to reproduce it again. This can help us locate the bug.
Except for the InternVL2_llama3-76B model that does not fit into a single GPU, have you tried InternVL2-8B or InternVL2-26B without tensor parallelism on the H100 GPU?
As with the other issue (https://github.com/InternLM/lmdeploy/issues/2164), which may be identical, 1-5 also does same thing after the hash I mentioned. I did not myself bisect where the bug was introduced. That 1-5 uses single GPU on my H100.
We have tried several settings with InternVL-Chat-V1-5 (tp=1,2,4, cuda11/12) on our A100 server but failed to reproduce it (we don't have access to H100 machines at the moment)
It seems that #2082 (263e8cfbced7d8261a1f66223ade9427af795eba) breaks TP on some machine without NVLink. I'm not sure whether it affects your system. I created #2218 to address it but struggle to verify whether it's working since I can't reproduce it myself.
Because CUDA kernel launch is asynchronous, the crash site is usually not where the error actully occurs. It would be very helpful if you can reproduce the bug with environment variables I mentioned before (which will insert sychronization & error check for almost all kernel launches).
@pseudotensor could you help trying the latest v0.5.3?
After checking with other community users, this issue is confirmed to be resolved.
Checklist
Describe the bug
illegal memory access was encountered /opt/lmdeploy/src/turbomind/utils/allocator.h:233
Reproduction
build
Dockerfile.internvl3 file:
build
run
run script:
gives on server:
happens every time
Environment
Error traceback