DeepGraphLearning / NBFNet

Official implementation of Neural Bellman-Ford Networks (NeurIPS 2021)
MIT License
197 stars 29 forks source link

Unable to run the code with ImportError in cpp_extension #8

Closed JiaangL closed 2 years ago

JiaangL commented 2 years ago

Hi! I followed the instruction to install the packages. But now I'm getting an ImportError when reproducing the results. The error is as following. I also tried rm -r ~/.cache/torch_extensions/* as suggested in Readme but that will cause more error.

Traceback (most recent call last): File "script/run.py", line 69, in train_and_validate(cfg, solver) File "script/run.py", line 28, in train_and_validate solver.evaluate("test") File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, kwargs) File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/torchdrug-0.1.2-py3.8.egg/torchdru g/core/engine.py", line 206, in evaluate pred, target = model.predict_and_target(batch) File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/torchdrug-0.1.2-py3.8.egg/torchdru g/tasks/task.py", line 27, in predict_and_target return self.predict(batch, all_loss, metric), self.target(batch) File "/home/lja/git_clone/NBFNet/nbfnet/task.py", line 277, in predict t_pred = self.model(graph, h_index, t_index, r_index, all_loss=all_loss, metric=metric) File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, *kwargs) File "/home/lja/git_clone/NBFNet/nbfnet/model.py", line 149, in forward output = self.bellmanford(graph, h_index[:, 0], r_index[:, 0]) File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/decorator.py", line 232, in fun return caller(func, (extras + args), kw) File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/torchdrug-0.1.2-py3.8.egg/torchdru g/utils/decorator.py", line 88, in wrapper result = forward(self, *args, *kwargs) File "/home/lja/git_clone/NBFNet/nbfnet/model.py", line 115, in bellmanford hidden = layer(step_graph, layer_input) File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(input, **kwargs) File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/torchdrug-0.1.2-py3.8.egg/torchdru g/layers/conv.py", line 91, in forward update = self.message_and_aggregate(graph, input) File "/home/lja/git_clone/NBFNet/nbfnet/layer.py", line 140, in message_and_aggregate sum = functional.generalized_rspmm(adjacency, relation_input, input, sum="add", mul=mul) File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/torchdrug-0.1.2-py3.8.egg/torchdru g/layers/functional/spmm.py", line 378, in generalized_rspmm return Function.apply(sparse.coalesce(), relation, input) File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/torchdrug-0.1.2-py3.8.egg/torchdru g/layers/functional/spmm.py", line 172, in forward forward = spmm.rspmm_add_mul_forward_cuda File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/torchdrug-0.1.2-py3.8.egg/torchdru g/utils/torch.py", line 27, in getattr return getattr(self.module, key) File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/torchdrug-0.1.2-py3.8.egg/torchdru g/utils/decorator.py", line 21, in get result = self.func(obj) File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/torchdrug-0.1.2-py3.8.egg/torchdru g/utils/torch.py", line 31, in module return cpp_extension.load(self.name, self.sources, self.extra_cflags, self.extra_cuda_cflags, File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/torch/utils/cpp_extension.py", lin e 1144, in load return _jit_compile( File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/torch/utils/cpp_extension.py", lin e 1382, in _jit_compile return _import_module_from_library(name, build_directory, is_python_module) File "/home/lja/anaconda3/envs/NBFnet/lib/python3.8/site-packages/torch/utils/cpp_extension.py", lin e 1776, in _import_module_from_library module = importlib.util.module_from_spec(spec) File "", line 556, in module_from_spec File "", line 1166, in create_module File "", line 219, in _call_with_frames_removed ImportError: /home/lja/.cache/torch_extensions/spmm_0/spmm.so: cannot open shared object file: No such file or directory

I'm using torch1.11+cuda11.3 \ torchdrug0.1.2

Do you know how to dealing with this? Any help is appreciated! By the way, in other issues I noticed an enviroment.yml would be released. Where can I find that? Thanks!

JiaangL commented 2 years ago

Now I'm able to run the code by reimplementing the torchdrug.layers.functional.generalized_rspmm function by myself using python instead of cuda. It takes about 1.5 hours to train one epoch using RTX3090 on fb15k237-v1. Model converges after one or two epochs. My code is below and any suggestion is welcomed.


 def my_generalized_rspmm(sparse, relation, input, sum='add', mul='mul'):   
    sparse_indices = sparse._indices()
    sparse_values = sparse._values()

    sparse_dict = dict()
    for i in range(sparse_indices.shape[-1]):
        key = sparse_indices[0, i].item()
        value = sparse_indices[1:, i].tolist()
        if key not in sparse_dict.keys():
            sparse_dict[key] = [value]
        else:
            sparse_dict[key].append(value)
    output = torch.zeros([sparse.shape[0], relation.shape[-1]]).to(device)
    if sum == 'add':
        for key in sparse_dict.keys():
            for value in sparse_dict[key]:
                tmp = torch.mul(input[value[0]], relation[value[1]]) + output[key]
                print('tmp:', tmp)
                output[key] = tmp
    elif sum == 'max':
        for key in sparse_dict.keys():
            for value in sparse_dict[key]:
                tmp = output[key].clone()
                output[key] = torch.maximum(tmp, torch.mul(input[value[0]], relation[value[1]]))
    elif sum == 'min':
        for key in sparse_dict.keys():
            for value in sparse_dict[key]:
                tmp = output[key].clone()
                output[key] = torch.minimum(tmp, torch.mul(input[value[0]], relation[value[1]]))
    else:
        raise NotImplementedError

    return output