Open yufengwhy opened 2 years ago
Hi, thank you for your interest in our work. Yes, you can use multiGPU with this implementation by allocating shared memory space and pinning it with the unified tensor. However, we are pushing our idea on DGL repository and some upgrades are coming soon https://github.com/dmlc/dgl/pull/3616. So you can take a look at there as well!
I think huge matrix multiplication is very basic, which is not only for graph, so we can implement it as a basic unit, not coupled with graph?
Hi, yes the latter link is about the graph (if you need), but DGL supports unified tensor. Please see the document link here https://docs.dgl.ai/en/latest/api/python/dgl.contrib.UnifiedTensor.html. You simply need to declare the unified tensor on a shared memory space.
Some quick examples (may need some syntax corrections):
For the single GPU case:
def train(feat_matrix, ...):
feat_matrix_unified = dgl.contrib.UnifiedTensor(feat_matrix, device=torch.device('cuda'))
# Access feat_matrix_unified from here
# e.g., a = feat_matrix_unified[indices]
# !!! "indices" must be a cuda tensor !!!
if __name__ == '__main__':
...
feat_matrix = dataload() # some user defined function here to load data into torch tensor
train(feat_matrix, ...)
For the multi GPU case:
def train(feat_matrix, device, ...):
torch.cuda.set_device(device)
feat_matrix_unified = dgl.contrib.UnifiedTensor(feat_matrix, device=torch.device(device))
# Access feat_matrix_unified from here
# e.g., a = feat_matrix_unified[indices]
# !!! "indices" must be a cuda tensor !!!
if __name__ == '__main__':
...
feat_matrix = dataload() # some user defined function here to load data into torch tensor
feat_matrix = feat_matrix.share_memory_()
...
for proc_id in range(n_gpus):
p = mp.Process(target=train, args=(feat_matrix, n_gpus, ...))
p.start()
Hope this helps!
Can we use this code with multi gpus? if so, give some examples in readme? thx~
say, there are 1 billion nodes and 60 billion edges, so the matrix will be 500G while A100 has 80G memory.