RuntimeError: CUDA error: operation not supported when calling `cusparseCreate(handle)`

zzidlezz commented 3 months ago

When I run the following command：

python train_gcond_transduct.py --dataset cora --nlayers=2 --lr_feat=1e-4 --gpu_id=0 --lr_adj=1e-4 --r=0.5

the testing phase after 400 training sessions will encounter the following bugs：

File "E:\数据蒸馏代码\GCond-main\models\gcn.py", line 43, in forward Epoch 350, loss_avg: 0.15287520616782013 Epoch 400, loss_avg: 0.15193571531805686 Traceback (most recent call last): File "E:\数据蒸馏代码\GCond-main\train_gcond_transduct.py", line 57, in agent.train() File "E:\数据蒸馏代码\GCond-main\gcond_agent_transduct.py", line 271, in train res.append(self.test_with_val()) File "E:\数据蒸馏代码\GCond-main\gcond_agent_transduct.py", line 99, in test_with_val model.fit_with_val(feat_syn, adj_syn, labels_syn, data, File "E:\数据蒸馏代码\GCond-main\models\gcn.py", line 255, in fit_with_val self._train_with_val(labels, data, train_iters, verbose) File "E:\数据蒸馏代码\GCond-main\models\gcn.py", line 289, in _train_with_val output = self.forward(feat_full, adj_full_norm) File "E:\数据蒸馏代码\GCond-main\models\gcn.py", line 100, in forward x = layer(x, adj) File "D:\ProgramData\Anaconda3\envs\graph_cond\lib\site-packages\torch\nn\modules\module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "E:\数据蒸馏代码\GCond-main\models\gcn.py", line 43, in forward output = torch.spmm(adj, support) RuntimeError: CUDA error: operation not supported when calling cusparseCreate(handle)

After checking, it was said to be related to the CUDA version, but the environment I used is consistent with the one you provided. Can you provide a solution?

My environment configuration is as follows：

Package Version

ase 3.22.1 certifi 2022.12.7 charset-normalizer 3.3.2 colorama 0.4.6 cycler 0.11.0 Cython 0.29.14 deeprobust 0.2.4 fonttools 4.38.0 gensim 3.8.3 googledrivedownloader 0.4 h5py 3.8.0 idna 3.7 imageio 2.31.2 importlib-metadata 4.13.0 isodate 0.6.1 Jinja2 3.1.4 joblib 1.3.2 kiwisolver 1.4.5 littleutils 0.2.2 llvmlite 0.39.1 MarkupSafe 2.1.5 matplotlib 3.5.3 networkx 2.6.3 numba 0.56.4 numpy 1.21.6 ogb 1.3.0 outdated 0.2.2 packaging 24.0 pandas 1.3.5 Pillow 9.5.0 pip 22.3.1 protobuf 4.24.4 pyparsing 3.1.2 python-dateutil 2.9.0.post0 python-louvain 0.16 pytz 2024.1 PyWavelets 1.3.0 rdflib 6.3.2 requests 2.31.0 scikit-image 0.19.3 scikit-learn 1.0.2 scipy 1.7.3 setuptools 65.6.3 six 1.16.0 smart-open 7.0.4 tensorboardX 2.6.2.2 texttable 1.7.0 threadpoolctl 3.1.0 tifffile 2021.11.2 torch 1.7.1+cu110 torch-cluster 1.5.9 torch-geometric 1.6.3 torch-scatter 2.0.7 torch-sparse 0.6.8 torch-spline-conv 1.2.1 torchaudio 0.7.2 torchvision 0.8.2+cu110 tqdm 4.66.4 typing_extensions 4.7.1 urllib3 2.0.7 wheel 0.38.4 wincertstore 0.2 wrapt 1.16.0 zipp 3.15.0

ChandlerBang commented 3 months ago

Hmmmm, we only tested the code on linux/macos environments. So it could be an issue specific to the WIndows system.

I would suggest you test the function of torch.spmm() in a separate file and try to adjust the pytorch or cuda versions.

zzidlezz commented 3 months ago

Thank you for your reply. I noticed that some of your subsequent articles directly reference the results from your original text when comparing. If my PyTorch or Cuda versions are different, can I also directly reference your results

rockcor commented 3 months ago

Thank you for your reply. I noticed that some of your subsequent articles directly reference the results from your original text when comparing. If my PyTorch or Cuda versions are different, can I also directly reference your results

Your torch-sparse 0.6.8 may not be intalled properly. You may intall its cuda version by: pip install torch_sparse -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html

I can run it on windows (CUDA 11.3) with the following environment:

deeprobust==0.2.10
matplotlib==3.5.2
networkx==2.8
numpy==1.24.4
ogb==1.3.6
PyGSP==0.5.1
scikit_learn==1.3.0
scipy==1.13.1
sortedcontainers==2.4.0
torch==1.12.1
torch_geometric==2.5.3
torch_scatter==2.0.9
torch_sparse==0.6.16+pt112cu113
tqdm==4.64.0

ChandlerBang commented 3 months ago

Thank you for your reply. I noticed that some of your subsequent articles directly reference the results from your original text when comparing. If my PyTorch or Cuda versions are different, can I also directly reference your results

Yeah, as long as you are working on the same setting, referencing my results should be fine. Thanks.

ChandlerBang / GCond

RuntimeError: CUDA error: operation not supported when calling `cusparseCreate(handle)` #14