Qsingle / MegEngine_CU11

MegEngine build with cu11x
17 stars 0 forks source link

how to install megengine cuda11.4 cudnn8.2.4? #2

Open Ctu-SiTuShenPeng opened 2 years ago

Ctu-SiTuShenPeng commented 2 years ago

你好,我想问下我目前正在使用的环境是python3.8 cuda11.4 cudnn8.2.4,我应该怎么去安装megengine?我用了您的whl,但是cuda11.2,cudnn是8.1.1,他会报错: RuntimeError: assertion `CUDNN_VERSION == cudnnGetVersion()' failed at ../../../../../../dnn/src/cuda/handle.cpp:58: megdnn::cuda::HandleImpl::HandleImpl(megcoreComputingHandle_t) extra message: cudnn version mismatch: compiled with 8101; detected 8204 at runtime, may caused by customized environment, for example LD_LIBRARY_PATH on LINUX and PATH on Windows!!

Qsingle commented 2 years ago

因为WHL打包的时候是会修改rpath(主要是官方为了让用户免去装cuda和cudnn的这个过程,所以做了CUDA和CUDNN的依赖打包和这个检测),但是这个会和PATH里面的冲突,所以可以把PATH里面关于CUDA和CUDNN的部分去除,不然就会出现编译用的版本和装的版本的冲突。

Ctu-SiTuShenPeng commented 2 years ago

好的,谢谢,我还想问下920m的显卡有问题吗,这个算力不知道能不能用

Ctu-SiTuShenPeng commented 2 years ago

还想问一下,目前这个框架可以只安装他的源码,不把cuda装进去吗??这框架刚学习,还不太了解

Qsingle commented 2 years ago

目前来说他是会编进去的,所以不能那个。你可以直接安装官方的版本试试,那个不是该rpath而是直接静态链接了,所以不会出现这个问题,920m我不记得是不是cuda11好像不支持,可以查一下cuda11支持的架构。

Ctu-SiTuShenPeng commented 2 years ago

谢谢,我试一下

Ctu-SiTuShenPeng commented 2 years ago

哥,我想问个问题,不太熟悉,这个问题是不是我需要根据我的电脑情况编译对应的版本? MegEngine 版本:1.9.0 GPU型号:Nvidia GeForce 920M 系统环境:Ubuntu python版本: 3.8.10

完整报错信息: RuntimeError: cuda error invalid device function(98) occurred; expr: cudaOccupancyMaxPotentialBlockSizeVariableSMem( &ret.grid_size, &ret.block_size, kern, s) error file:../../../../../../dnn/src/cuda/query_blocksize_impl.cu:50

backtrace: /usr/local/lib/python3.8/site-packages/megengine/core/lib/libmegengine_shared.so(_ZN3mgb13MegBrainErrorC1ERKSs+0x4a) [0x7ff04f023f8a] /usr/local/lib/python3.8/site-packages/megengine/core/lib/libmegengine_shared.so(+0x20ed8a7) [0x7ff04f0718a7] /usr/local/lib/python3.8/site-packages/megengine/core/lib/libmegengine_shared.so(_ZN6megdnn12ErrorHandler15on_megdnn_errorERKSs+0x14) [0x7ff04f560f84] /usr/local/lib/python3.8/site-packages/megengine/core/lib/libmegengine_shared.so(_ZN6megdnn12ErrorHandler15on_megdnn_errorEPKc+0x22) [0x7ff04f562a82] /usr/local/lib/python3.8/site-packages/megengine/core/lib/libmegengine_shared.so(_ZN6megdnn4cuda20throw_cuda_errorE9cudaErrorPKc+0x35) [0x7ff051265405] /usr/local/lib/python3.8/site-packages/megengine/core/lib/libmegengine_shared.so(_ZN6megdnn4cuda6detail39query_launch_config_for_kernel_uncachedEPKvRKNS0_10SmemGetterE+0x131) [0x7ff053552e21] /usr/local/lib/python3.8/site-packages/megengine/core/lib/libmegengine_shared.so(_ZN6megdnn4cuda30query_launch_config_for_kernelEPKvRKNS0_10SmemGetterE+0x312) [0x7ff05122a9e2] /usr/local/lib/python3.8/site-packages/megengine/core/lib/libmegengine_shared.so(_ZN6megdnn4cuda13elemwise_intl15get_launchspecEPKvmPiS4+0x3c) [0x7ff0511c70ec] /usr/local/lib/python3.8/site-packages/megengine/core/lib/libmegengine_shared.so(_ZN6megdnn4cuda22noncontig_general_intl13UserOpInvokerIiLi2EE9dispatch2INS0_16ParamElemVisitorILi1EiLNS0_10ContigTypeE1EEEEEvv+0xbc) [0x7ff0538442ec] /usr/local/lib/python3.8/site-packages/megengine/core/lib/libmegengine_shared.so(_ZN6megdnn4cuda22copy_noncontig_generalERKNS_8TensorNDES3_P11CUstream_st+0x2b54) [0x7ff053742384]

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/ctu/Ctu_Project/Ctu_MegEngine/Classification/check_framwork.py", line 85, in optimizer = SGD(net.parameters(), lr=0.01, momentum=0.9, weight_decay=5e-4) File "/usr/local/lib/python3.8/site-packages/megengine/optimizer/sgd.py", line 48, in init super().init(params, defaults) File "/usr/local/lib/python3.8/site-packages/megengine/optimizer/optimizer.py", line 73, in init self.add_param_group(group) File "/usr/local/lib/python3.8/site-packages/megengine/optimizer/optimizer.py", line 100, in add_param_group param._reset(Tensor(param.numpy(), no_cache=True)) File "/usr/local/lib/python3.8/site-packages/megengine/tensor.py", line 142, in numpy return super().numpy() megengine.core._imperative_rt.core2.AsyncError: An async error is reported. See above for the actual cause. Hint: This is where it is reported, not where it happened. You may call `megengine.config.async_level = 0 to get better error reporting.