evilsocket / cake

Distributed LLM and StableDiffusion inference for mobile, desktop and server.
Other
2.44k stars 127 forks source link

PTX代码使用了一个不被支持的工具链进行编译 #11

Closed JKYtydt closed 1 month ago

JKYtydt commented 1 month ago

您好,我在使用中遇到了新的问题 运行命令

RUST_LOG=debug CUDA_VISIBLE_DEVICES=2 ./cake-cli --model /data1/pre_trained_model/Llama-3-8B-Instruct --topology /sdc/jky/cake/topology.yml

报错如下:

[2024-07-17T06:24:01Z DEBUG] device is cuda 0
[2024-07-17T06:24:01Z INFO ] [Master] dtype=F16 device=Cuda(CudaDevice(DeviceId(1))) mem=220.7 MiB
[2024-07-17T06:24:01Z INFO ] loading configuration from /data1/pre_trained_model/Llama-3-8B-Instruct/config.json
[2024-07-17T06:24:01Z INFO ] loading topology from /sdc/jky/cake/topology.yml
[2024-07-17T06:24:01Z DEBUG] cache::n_elem = 128
[2024-07-17T06:24:01Z DEBUG] cache::theta = [ 1.0000e0, 8.1462e-1, 6.6360e-1, 5.4058e-1, 4.4037e-1, 3.5873e-1, 2.9223e-1,
     2.3805e-1, 1.9392e-1, 1.5797e-1, 1.2869e-1, 1.0483e-1, 8.5397e-2, 6.9566e-2,
     5.6670e-2, 4.6164e-2, 3.7606e-2, 3.0635e-2, 2.4955e-2, 2.0329e-2, 1.6560e-2,
     1.3490e-2, 1.0990e-2, 8.9523e-3, 7.2927e-3, 5.9407e-3, 4.8394e-3, 3.9423e-3,
     3.2114e-3, 2.6161e-3, 2.1311e-3, 1.7360e-3, 1.4142e-3, 1.1520e-3, 9.3847e-4,
     7.6450e-4, 6.2277e-4, 5.0732e-4, 4.1327e-4, 3.3666e-4, 2.7425e-4, 2.2341e-4,
     1.8199e-4, 1.4825e-4, 1.2077e-4, 9.8381e-5, 8.0143e-5, 6.5286e-5, 5.3183e-5,
     4.3324e-5, 3.5292e-5, 2.8750e-5, 2.3420e-5, 1.9078e-5, 1.5542e-5, 1.2660e-5,
     1.0313e-5, 8.4015e-6, 6.8440e-6, 5.5752e-6, 4.5417e-6, 3.6997e-6, 3.0139e-6,
     2.4551e-6]
    Tensor[[64], f32, cuda:0]
Error: DriverError(CUDA_ERROR_UNSUPPORTED_PTX_VERSION, "the provided PTX was compiled with an unsupported toolchain.") when loading cast_u32_f32
yaojunluo commented 1 month ago

The issue I encountered was due to a mismatch between the CUDA version reported by nvcc and the one shown by nvidia-smi. You might want to verify if this discrepancy exists in your case as well.

JKYtydt commented 1 month ago

@yaojunluo 根据您的建议我验证了一下,确实nvcc 报告的 CUDA 版本与 nvidia-smi 显示的版本不匹配,您这边是解决这个问题以后就正常运行了么?

yaojunluo commented 1 month ago

@yaojunluo 根据您的建议我验证了一下,确实nvcc 报告的 CUDA 版本与 nvidia-smi 显示的版本不匹配,您这边是解决这个问题以后就正常运行了么?

My computer will function normally after resolving this issue.

evilsocket commented 1 month ago

thank you so much @yaojunluo for the help! i'll close this for the time being, let me know if the problem happens again!