Closed fff-2013 closed 4 years ago
After delete req/res pointers in edge_sampler.py and neighbor_sampler.py, memory leak is gone.
diff --git a/graphlearn/python/sampler/edge_sampler.py b/graphlearn/python/sampler/edge_sampler.py
index 4103ca1..79ab314 100644
--- a/graphlearn/python/sampler/edge_sampler.py
+++ b/graphlearn/python/sampler/edge_sampler.py
@@ -83,6 +83,8 @@ class EdgeSampler(object):
src_ids,
dst_ids)
edges.edge_ids = edge_ids
+ pywrap.del_get_edge_req(req)
+ pywrap.del_get_edge_res(res)
return edges
diff --git a/graphlearn/python/sampler/neighbor_sampler.py b/graphlearn/python/sampler/neighbor_sampler.py
index 1757f12..b912e50 100644
--- a/graphlearn/python/sampler/neighbor_sampler.py
+++ b/graphlearn/python/sampler/neighbor_sampler.py
@@ -124,6 +124,8 @@ class NeighborSampler(object):
current_batch_size = nbr_ids_flat.size
src_ids = nbr_ids
+ pywrap.del_nbr_req(req)
+ pywrap.del_nbr_res(res)
return layers
def _make_req(self, index, src_ids):
@@ -200,4 +202,6 @@ class FullNeighborSampler(NeighborSampler):
current_batch_size = nbr_ids_flat.size
src_ids = nbr_ids
+ pywrap.del_nbr_req(req)
+ pywrap.del_nbr_res(res)
return layers
When I want to provide feedback,i see this #46 , more modifications. Awesome!
@fff-2013 Sorry to trouble you and thanks for pointing out the problem. We've fixed it and you can try again.
Problem description
When I run the graphsage dist_train.py(cora data), the worker memory usage keeps increasing:
When I train model with our own data, which is a larger graph, the memory usage grows faster:
I guess if there is any memory leak? May be that some objects of the previous iterations are not free? Any advice or suggestions will be greatly appreciated.
Environment information for cora data
docker image: registry.cn-zhangjiakou.aliyuncs.com/pai-image/graph-learn:v0.1-cpu
code path: /workspace/graph-learn/examples/tf/graphsage (in docker container)
config: 2ps, 2worker / batchsize: 32 / epoch: 40000000