HazyResearch / hgcn

Hyperbolic Graph Convolutional Networks in PyTorch.
602 stars 109 forks source link

When I use my dataset to train hgcn and hnn models, the curvature is 'nan' and raise error. #28

Open Abigale001 opened 3 years ago

Abigale001 commented 3 years ago

/opt/conda/conda-bld/pytorch_1544174967633/work/aten/src/THCUNN/BCECriterion.cu:42: Acctype bce_functor<Dtype, Acctype>::operator()(Tuple) [with Tuple = thrust::detail::tuple_of_iterator_references<thrust::device_reference, thrust::device_reference, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, Dtype = float, Acctype = float]: block: [11,0,0], thread: [209,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1544174967633/work/aten/src/THCUNN/BCECriterion.cu:42: Acctype bce_functor<Dtype, Acctype>::operator()(Tuple) [with Tuple = thrust::detail::tuple_of_iterator_references<thrust::device_reference, thrust::device_reference, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, Dtype = float, Acctype = float]: block: [11,0,0], thread: [210,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1544174967633/work/aten/src/THCUNN/BCECriterion.cu:42: Acctype bce_functor<Dtype, Acctype>::operator()(Tuple) [with Tuple = thrust::detail::tuple_of_iterator_references<thrust::device_reference, thrust::device_reference, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, Dtype = float, Acctype = float]: block: [11,0,0], thread: [211,0,0] Assertion input >= 0. && input <= 1. failed. /opt/conda/conda-bld/pytorch_1544174967633/work/aten/src/THCUNN/BCECriterion.cu:42: Acctype bce_functor<Dtype, Acctype>::operator()(Tuple) [with Tuple = thrust::detail::tuple_of_iterator_references<thrust::device_reference, thrust::device_reference, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, Dtype = float, Acctype = float]: block: [11,0,0], thread: [212,0,0] Assertion input >= 0. && input <= 1. failed.

torch.Size([80362]) THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1544174967633/work/aten/src/THC/THCCachingHostAllocator.cpp line=265 error=59 : device-side assert triggered Traceback (most recent call last): File "/data1/home/ideatmp/sigir21/hgcn/models/base_models.py", line 124, in compute_metrics loss = F.binary_cross_entropy(pos_scores, torch.ones_like(pos_scores)) File "/data1/home/ideatmp/miniconda3/envs/HGN/lib/python3.6/site-packages/torch/nn/functional.py", line 2027, in > > binary_cross_entropy input, target, weight, reduction_enum) RuntimeError: reduce failed to synchronize: device-side assert triggered

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/data1/home/ideatmp/.pycharm_helpers/pydev/pydevd.py", line 1668, in main() File "/data1/home/ideatmp/.pycharm_helpers/pydev/pydevd.py", line 1662, in main globals = debugger.run(setup['file'], None, None, is_module) File "/data1/home/ideatmp/.pycharm_helpers/pydev/pydevd.py", line 1072, in run pydev_imports.execfile(file, globals, locals) # execute the script File "/data1/home/ideatmp/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "/data1/home/ideatmp/sigir21/hgcn/train.py", line 213, in train(args, feature) File "/data1/home/ideatmp/sigir21/hgcn/train.py", line 129, in train train_metrics = model.compute_metrics(embeddings, data, 'train') File "/data1/home/ideatmp/sigir21/hgcn/models/base_models.py", line 127, in compute_metrics print(pos_scores) File "/data1/home/ideatmp/miniconda3/envs/HGN/lib/python3.6/site-packages/torch/tensor.py", line 66, in repr return torch._tensor_str._str(self) File "/data1/home/ideatmp/miniconda3/envs/HGN/lib/python3.6/site-packages/torch/_tensor_str.py", line 277, in _str tensor_str = _tensor_str(self, indent) File "/data1/home/ideatmp/miniconda3/envs/HGN/lib/python3.6/site-packages/torch/_tensor_str.py", line 195, in _tensor_str formatter = _Formatter(get_summarized_data(self) if summarize else self) File "/data1/home/ideatmp/miniconda3/envs/HGN/lib/python3.6/site-packages/torch/_tensor_str.py", line 221, in > get_summarized_data return torch.cat((self[:PRINT_OPTS.edgeitems], self[-PRINT_OPTS.edgeitems:])) RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-> > bld/pytorch_1544174967633/work/aten/src/THC/THCCachingHostAllocator.cpp:265

I debug the error, it shows when training, the curvature is nan. How to solve this problem?

gshahaaa commented 2 years ago

when I use my dataset. the same question occure. And I find the value of the curvature here is a negtive number, which result in this nan. https://github.com/HazyResearch/hgcn/blob/a526385744da25fc880f3da346e17d0fe33817f8/manifolds/poincare.py#L74 And how to solve this problem?