Shen-Lab / GraphCL

[NeurIPS 2020] "Graph Contrastive Learning with Augmentations" by Yuning You, Tianlong Chen, Yongduo Sui, Ting Chen, Zhangyang Wang, Yang Shen
MIT License
541 stars 103 forks source link

Error information when I run the, gsimclr.py --DS ENZYMES --lr 0.01 --local --num-gc-layers 3 --aug random4 --seed 0 #29

Open Austinzhenghua opened 3 years ago

Austinzhenghua commented 3 years ago

600 1

lr: 0.01 num_features: 1 hidden_dim: 32 num_gc_layers: 3

/opt/conda/conda-bld/pytorch_1623448224956/work/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [158,0,0], thread: [105,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1623448224956/work/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [158,0,0], thread: [55,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1623448224956/work/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [158,0,0], thread: [56,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1623448224956/work/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [158,0,0], thread: [57,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1623448224956/work/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [158,0,0], thread: [58,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1623448224956/work/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [158,0,0], thread: [59,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1623448224956/work/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [158,0,0], thread: [60,0,0] Assertion srcIndex < srcSelectDimSize failed. /opt/conda/conda-bld/pytorch_1623448224956/work/aten/src/ATen/native/cuda/Indexing.cu:702: indexSelectLargeIndex: block: [158,0,0], thread: [61,0,0] Assertion srcIndex < srcSelectDimSize failed. Traceback (most recent call last): File "/home/zhenghua/pythoncode/unsupervised_graph_TU/gsimclr.py", line 190, in emb, y = model.encoder.get_embeddings(dataloader_eval) File "/home/zhenghua/pythoncode/unsupervised_graph_TU/gin.py", line 76, in getembeddings x, = self.forward(x, edge_index, batch) File "/home/zhenghua/pythoncode/unsupervised_graph_TU/gin.py", line 52, in forward x = F.relu(self.convs[i](x, edge_index)) File "/home/zhenghua/.conda/envs/pytorchgeo/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, kwargs) File "/home/zhenghua/.conda/envs/pytorchgeo/lib/python3.7/site-packages/torch_geometric/nn/conv/gin_conv.py", line 64, in forward out = self.propagate(edge_index, x=x, size=size) File "/home/zhenghua/.conda/envs/pytorchgeo/lib/python3.7/site-packages/torch_geometric/nn/conv/message_passing.py", line 253, in propagate out = self.aggregate(out, aggr_kwargs) File "/home/zhenghua/.conda/envs/pytorchgeo/lib/python3.7/site-packages/torch_geometric/nn/conv/message_passing.py", line 288, in aggregate reduce=self.aggr) File "/home/zhenghua/.conda/envs/pytorchgeo/lib/python3.7/site-packages/torch_scatter/scatter.py", line 153, in scatter return scatter_sum(src, index, dim, out, dim_size) File "/home/zhenghua/.conda/envs/pytorchgeo/lib/python3.7/site-packages/torch_scatter/scatter.py", line 21, in scatter_sum return out.scatteradd(dim, index, src) RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Can anyone help me with what wrong with the algorithm or the enviroment?

the environment as follows:

Jinja2 3.0.1 3.0.1
MarkupSafe 2.0.1 2.0.1
Pillow 8.2.0 8.2.0
PySocks 1.7.1 1.7.1
brotlipy 0.7.0 0.7.0
certifi 2020.6.20 2021.5.30
cffi 1.14.5 1.14.5
chardet 4.0.0 4.0.0
cryptography 3.4.7 3.4.7
cycler 0.10.0 0.10.0
decorator 4.4.2 5.0.9
googledrivedownloader 0.4 0.4
idna 2.10 3.2
joblib 1.0.1 1.0.1
kiwisolver 1.3.1 1.3.1
matplotlib 3.4.2 3.4.2
mkl-fft 1.3.0 1.3.0
mkl-random 1.2.1 1.2.2
mkl-service 2.3.0 2.4.0
networkx 2.5.1 2.6rc2
numpy 1.20.2 1.21.0
olefile 0.46 0.47.dev4
pandas 1.2.5 1.3.0rc1
pip 21.1.2 21.1.3
pyOpenSSL 20.0.1 20.0.1
pycparser 2.20 2.20
pyparsing 2.4.7 3.0.0b2
python-dateutil 2.8.1 2.8.1
python-louvain 0.15 0.15
pytz 2021.1 2021.1
requests 2.25.1 2.25.1
scikit-learn 0.24.2 0.24.2
scipy 1.6.2 1.7.0
seaborn 0.11.0 0.11.1
setuptools 52.0.0.post20210125 57.0.0
six 1.16.0 1.16.0
threadpoolctl 2.1.0 2.1.0
torch 1.9.0 1.9.0
torch-cluster 1.5.9 1.5.9
torch-geometric 1.7.2 1.7.2
torch-scatter 2.0.7 2.0.7
torch-sparse 0.6.10 0.6.10
torch-spline-conv 1.2.1 1.2.1
torchaudio 0.9.0a0+33b2469 0.9.0
torchvision 0.10.0 0.10.0
tornado 6.1 6.1
tqdm 4.61.1 4.61.1
typing-extensions 3.7.4.3 3.10.0.0
urllib3 1.26.6 1.26.6
wheel 0.36.2 0.36.2
yyou1996 commented 3 years ago

Hi @Austinzhenghua,

Thanks for your feedback. Does torch_geometric==1.7.2 not work for you? You can take a try version 1.6.0/1.6.1 for this experiment.

Austinzhenghua commented 3 years ago

Hi. can I have your we-chat to ask you some more detailed questions? hua zheng @.*** 签名由 网易邮箱大师 定制 On 06/29/2021 21:38, Yuning You wrote: Hi @Austinzhenghua, Thanks for your feedback. Does torch_geometric==1.7.2 not work for you? You can take a try version 1.6.0/1.6.1 for this experiment. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

yyou1996 commented 3 years ago

Just for a test, are you capable to run this https://github.com/fanyun-sun/InfoGraph/tree/master/unsupervised which the unsupervised_TU experiment is built on?

Austinzhenghua commented 3 years ago

Just for a test, are you capable to run this https://github.com/fanyun-sun/InfoGraph/tree/master/unsupervised which the unsupervised_TU experiment is built on?

Yes, I can run this algorithm, but it seems it didn't use GPU to train. The error above did cause by the version of torch_geometric. Can you run it in your computrer? Thanks a lot!

Austinzhenghua commented 3 years ago

Traceback (most recent call last): File "/home/zhenghua/pythoncode/unsupervised_TU_zh/gsimclr.py", line 189, in emb, y = model.encoder.get_embeddings(dataloader_eval) File "/home/zhenghua/pythoncode/unsupervised_TU_zh/gin.py", line 77, in getembeddings x, = self.forward(x, edge_index, batch) File "/home/zhenghua/pythoncode/unsupervised_TU_zh/gin.py", line 52, in forward x = F.relu(self.convs[i](x, edge_index)) File "/home/zhenghua/.conda/envs/graphcontra/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, kwargs) File "/home/zhenghua/.conda/envs/graphcontra/lib/python3.6/site-packages/torch_geometric/nn/conv/gin_conv.py", line 63, in forward out = self.propagate(edge_index, x=x, size=size) File "/home/zhenghua/.conda/envs/graphcontra/lib/python3.6/site-packages/torch_geometric/nn/conv/message_passing.py", line 233, in propagate kwargs) File "/home/zhenghua/.conda/envs/graphcontra/lib/python3.6/site-packages/torch_geometric/nn/conv/message_passing.py", line 158, in collect j if arg[-2:] == '_j' else i) File "/home/zhenghua/.conda/envs/graphcontra/lib/python3.6/site-packages/torch_geometric/nn/conv/message_passing.py", line 127, in lift return src.index_select(self.node_dim, index) RuntimeError: index out of range: Tried to access index 4324 out of table with 4323 rows**. at /opt/conda/conda-bld/pytorch_1579027003190/work/aten/src/TH/generic/THTensorEvenMoreMath.cpp:418

I run it on the CPU get this error.

Austinzhenghua commented 3 years ago

image image

I find the shape of x is different from your algorithm and infograph. the first one is infograph.

yyou1996 commented 3 years ago

It works well on my machine. What is the command u use? Please take a look at readme https://github.com/Shen-Lab/GraphCL/tree/master/unsupervised_TU#readme.

ztk1996 commented 2 years ago

Traceback (most recent call last): File "/home/zhenghua/pythoncode/unsupervised_TU_zh/gsimclr.py", line 189, in emb, y = model.encoder.get_embeddings(dataloader_eval) File "/home/zhenghua/pythoncode/unsupervised_TU_zh/gin.py", line 77, in getembeddings x, = self.forward(x, edge_index, batch) File "/home/zhenghua/pythoncode/unsupervised_TU_zh/gin.py", line 52, in forward x = F.relu(self.convs[i](x, edge_index)) File "/home/zhenghua/.conda/envs/graphcontra/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, kwargs) File "/home/zhenghua/.conda/envs/graphcontra/lib/python3.6/site-packages/torch_geometric/nn/conv/gin_conv.py", line 63, in forward out = self.propagate(edge_index, x=x, size=size) File "/home/zhenghua/.conda/envs/graphcontra/lib/python3.6/site-packages/torch_geometric/nn/conv/message_passing.py", line 233, in propagate kwargs) File "/home/zhenghua/.conda/envs/graphcontra/lib/python3.6/site-packages/torch_geometric/nn/conv/message_passing.py", line 158, in collect j if arg[-2:] == '_j' else i) File "/home/zhenghua/.conda/envs/graphcontra/lib/python3.6/site-packages/torch_geometric/nn/conv/message_passing.py", line 127, in lift return src.index_select(self.node_dim, index) RuntimeError: index out of range: Tried to access index 4324 out of table with 4323 rows**. at /opt/conda/conda-bld/pytorch_1579027003190/work/aten/src/TH/generic/THTensorEvenMoreMath.cpp:418

I run it on the CPU get this error.

I have the same error. Have you fixed it?

yyou1996 commented 2 years ago

Hi @ztk1996,

I remember I tested the command and it worked ok in my machine. Would you also share your environment and the command you run?

ztk1996 commented 2 years ago

Hi @ztk1996,

I remember I tested the command and it worked ok in my machine. Would you also share your environment and the command you run?

Thanks for your reply. Error information when I run "./go.sh 1 AIDS subgraph" on CPU is as follows.

torch: 1.7.0 torch-geometric: 1.7.2

yyou1996 commented 2 years ago

@ztk1996

Please take a try to run with torch-geometric==1.6.0 and on GPU. Since both of you use torch-geometric>=1.7.0 and on CPU, I guess it might be the source of error.

ztk1996 commented 2 years ago

@ztk1996

Please take a try to run with torch-geometric==1.6.0 and on GPU. Since both of you use torch-geometric>=1.7.0 and on CPU, I guess it might be the source of error.

I try to run with torch_geometric==1.6.0, pytorch==1.7.0 and on GPU. And the error information is as follows.

Besides, when I run with torch_geometric==1.6.0, pytorch==1.7.0 and on CPU. The error information is the same as run with torch_geometric==1.7.2.

yyou1996 commented 2 years ago

@ztk1996

My impression is that the version of torch_geometric and pytorch should be consistent (https://github.com/rusty1s/pytorch_geometric)? If using torch_geometric==1.6 I would also use pytorch==1.6. Please notify me if this also not works. Thanks.