Issue of running pretrain.py

ngl567 commented 4 years ago

Hello, I got an error when I run the source code pretrain.py. I hope you can help me to solve this problem. The error information is shown as following:

Traceback (most recent call last):
  File "pretrain.py", line 139, in <module>
    train(args)
  File "pretrain.py", line 92, in train
    model.global_emb = model.get_global_emb(train_times_origin, graph_dict)
  File "/gs/home/beihangngl/RE-Net-master/global_model.py", line 67, in get_global_emb
    emb, _, _ = self.predict(t, graph_dict)
  File "/gs/home/beihangngl/RE-Net-master/global_model.py", line 88, in predict
    rnn_inp = self.aggregator.predict(t, self.ent_embeds, graph_dict, reverse=reverse)
  File "/gs/home/beihangngl/RE-Net-master/Aggregator.py", line 96, in predict
    batched_graph = dgl.batch(g_list)
  File "/gs/home/beihangngl/anaconda/envs/pytorch-gnn/lib/python3.8/site-packages/dgl/graph.py", line 4187, in batch
    cols = {key: F.cat([gr._node_frame[key] for gr in graph_list
  File "/gs/home/beihangngl/anaconda/envs/pytorch-gnn/lib/python3.8/site-packages/dgl/graph.py", line 4187, in <dictcomp>
    cols = {key: F.cat([gr._node_frame[key] for gr in graph_list
  File "/gs/home/beihangngl/anaconda/envs/pytorch-gnn/lib/python3.8/site-packages/dgl/backend/pytorch/tensor.py", line 141, in cat
    return th.cat(seq, dim=dim)
RuntimeError: Expected object of backend CUDA but got backend CPU for sequence element 1 in sequence argument at position #1 'tensors'

Hope for you help. Thank you.

rwer81 commented 4 years ago

I got same error. Any solution? Thanks.

woojeongjin commented 4 years ago

Hi guys, thank you for the interest. Did you guys run on GPUs? Also you should specify the GPU machine number, e.g., -gpu 0

rwer81 commented 4 years ago

Hi, I think https://github.com/INK-USC/RE-Net/blob/e28e41611a700368d45aa52191029f28120d3028/Aggregator.py#L97 "move_dgl_to_cuda(batched_graph)" when you call this line from "get_global_emb" function in global_model.py https://github.com/INK-USC/RE-Net/blob/e28e41611a700368d45aa52191029f28120d3028/global_model.py#L67 in line 67, after first loop you move first graph in "g_list" to cuda. But when second loop run, now len(g_list)= 2 but second element of g_list is not in cuda but in cpu. So dgl.batch tries to batch this two graph that are in different mode (first in gpu, second in cpu). I changed code a bit and now https://github.com/INK-USC/RE-Net/blob/e28e41611a700368d45aa52191029f28120d3028/Aggregator.py#L93 is:

 for tim in timess:
        move_dgl_to_cuda(graph_dict[tim.item()])
        g_list.append(graph_dict[tim.item()])

You can see , before appending a graph to g_list, I move cuda. So problem solved. Thanks.

woojeongjin commented 4 years ago

Hi, thanks for your comment! I have updated my code!

ngl567 commented 4 years ago

Thanks a lot for your suggestion.

INK-USC / RE-Net

Issue of running pretrain.py #28