fangwei123456 / spikingjelly

SpikingJelly is an open-source deep learning framework for Spiking Neural Network (SNN) based on PyTorch.
https://spikingjelly.readthedocs.io
Other
1.35k stars 239 forks source link

关于将cpu换成gpu遇到的问题 #331

Closed Zhxin99 closed 1 year ago

Zhxin99 commented 1 year ago

您好,我在Linux系统下用cpu跑spikingjelly/activation_based/examples/lif_fc_mnist.py没有问题,但是device改成cuda:0后就会出现以下错误 ,请问一下这该怎么解决 Traceback (most recent call last):

RuntimeError: The following operation failed in the TorchScript interpreter. Traceback of TorchScript (most recent call last): RuntimeError: nvrtc: error: invalid value for --gpu-architecture (-arch)

nvrtc compilation failed:

define NAN __int_as_float(0x7fffffff)

define POS_INFINITY __int_as_float(0x7f800000)

define NEG_INFINITY __int_as_float(0xff800000)

template device T maximum(T a, T b) { return isnan(a) ? a : (a > b ? a : b); }

template device T minimum(T a, T b) { return isnan(a) ? a : (a < b ? a : b); }

extern "C" global void fused_neg_add_mul_mul_add(float tspike_1, double vv_reset_2, float tv_1, float aten_add_1, float aten_add) { { if ((long long)(threadIdx.x) + 512ll (long long)(blockIdx.x)<160ll ? 1 : 0) { float tspike_1_1 = __ldg(tspike_1 + (long long)(threadIdx.x) + 512ll (long long)(blockIdx.x)); aten_add[(long long)(threadIdx.x) + 512ll (long long)(blockIdx.x)] = (0.f - tspike_1_1) + 1.f; float v = __ldg(tv_1 + (long long)(threadIdx.x) + 512ll (long long)(blockIdx.x)); aten_add_1[(long long)(threadIdx.x) + 512ll (long long)(blockIdx.x)] = ((0.f - tspike_1_1) + 1.f) v + tspike_1_1 * (float)(vv_reset_2); }} }

fangwei123456 commented 1 year ago

试一下pytorch是否支持gpu

Zhxin99 commented 1 year ago

支持的

fangwei123456 commented 1 year ago

提供一下完整的报错信息?

Zhxin99 commented 1 year ago

提供一下完整的报错信息?

Traceback (most recent call last):

File "/home/zhangxin/.conda/envs/dvs/lib/python3.8/site-packages/spyder_kernels/py3compat.py", line 356, in compat_exec exec(code, globals, locals)

File "/home/zhangxin/spikingjelly-stdp/spikingjelly/activation_based/examples/lif_stdp_mnist.py", line 303, in main()

File "/home/zhangxin/spikingjelly-stdp/spikingjelly/activation_based/examples/lif_stdp_mnist.py", line 198, in main out_fr += net(encoded_img) # predict value

File "/home/zhangxin/.conda/envs/dvs/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs)

File "/home/zhangxin/spikingjelly-stdp/spikingjelly/activation_based/examples/lif_stdp_mnist.py", line 33, in forward return self.layer(x)

File "/home/zhangxin/.conda/envs/dvs/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs)

File "/home/zhangxin/.conda/envs/dvs/lib/python3.8/site-packages/torch/nn/modules/container.py", line 141, in forward input = module(input)

File "/home/zhangxin/.conda/envs/dvs/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs)

File "/home/zhangxin/spikingjelly-stdp/spikingjelly/activation_based/examples/spikingjelly1/activation_based/base.py", line 266, in forward return self.single_step_forward(*args, **kwargs)

File "/home/zhangxin/spikingjelly-stdp/spikingjelly/activation_based/examples/spikingjelly1/activation_based/neuron.py", line 907, in single_step_forward return super().single_step_forward(x)

File "/home/zhangxin/spikingjelly-stdp/spikingjelly/activation_based/examples/spikingjelly1/activation_based/neuron.py", line 241, in single_step_forward self.neuronal_reset(spike)

File "/home/zhangxin/spikingjelly-stdp/spikingjelly/activation_based/examples/spikingjelly1/activation_based/neuron.py", line 205, in neuronal_reset self.v = self.jit_hard_reset(self.v, spike_d, self.v_reset)

RuntimeError: The following operation failed in the TorchScript interpreter. Traceback of TorchScript (most recent call last): RuntimeError: nvrtc: error: invalid value for --gpu-architecture (-arch)

nvrtc compilation failed:

define NAN __int_as_float(0x7fffffff)

define POS_INFINITY __int_as_float(0x7f800000)

define NEG_INFINITY __int_as_float(0xff800000)

template device T maximum(T a, T b) { return isnan(a) ? a : (a > b ? a : b); }

template device T minimum(T a, T b) { return isnan(a) ? a : (a < b ? a : b); }

extern "C" global void fused_neg_add_mul_mul_add(float tspike_1, double vv_reset_2, float tv_1, float aten_add_1, float aten_add) { { if ((long long)(threadIdx.x) + 512ll (long long)(blockIdx.x)<160ll ? 1 : 0) { float tspike_1_1 = __ldg(tspike_1 + (long long)(threadIdx.x) + 512ll (long long)(blockIdx.x)); aten_add[(long long)(threadIdx.x) + 512ll (long long)(blockIdx.x)] = (0.f - tspike_1_1) + 1.f; float v = __ldg(tv_1 + (long long)(threadIdx.x) + 512ll (long long)(blockIdx.x)); aten_add_1[(long long)(threadIdx.x) + 512ll (long long)(blockIdx.x)] = ((0.f - tspike_1_1) + 1.f) v + tspike_1_1 * (float)(vv_reset_2); }} }

fangwei123456 commented 1 year ago

运行一下下述代码,看是否报错

import torch

@torch.jit.script
def jit_hard_reset(v: torch.Tensor, spike: torch.Tensor, v_reset: float):
    v = (1. - spike) * v + spike * v_reset
    return v

device = 'cuda:0'

v = torch.rand([8], device=device)

spike = torch.rand_like(v)

v_reset = 0.

z = jit_hard_reset(v, spike, v_reset)
Zhxin99 commented 1 year ago

不报错

fangwei123456 commented 1 year ago

那就奇怪了,上面的报错信息是jit_hard_reset的jit编译报错,但运行下面这个同样的代码却没有问题

Zhxin99 commented 1 year ago

我也很纳闷,那看来只能用cpu跑了

fangwei123456 commented 1 year ago

是40系新GPU吗?新GPU对pytorch的支持可能有问题

https://github.com/pytorch/pytorch/issues/87595

Zhxin99 commented 1 year ago

是40系新GPU吗?新GPU对pytorch的支持可能有问题

pytorch/pytorch#87595

是RTX4090

fangwei123456 commented 1 year ago

估计是新GPU的问题了,试一下最新版的cuda搭配安装nightly版本的pytorch,看能否解决

conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch-nightly -c nvidia

Zhxin99 commented 1 year ago

估计是新GPU的问题了,试一下最新版的cuda搭配安装nightly版本的pytorch,看能否解决

conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch-nightly -c nvidia

好的,我去试试,谢谢!