dmlc / dgl

Python package built to ease deep learning on graph, on top of existing DL frameworks.
http://dgl.ai
Apache License 2.0
13.55k stars 3.01k forks source link

Better understanding on the usage of memory (allocation/deallocation) of a python process - distributed graph partitioning pipeline #4784

Open kylasa opened 2 years ago

kylasa commented 2 years ago

🔨Work Item

This issue will be used to track the discussion on memory related issues of a python process (in the context of distributed graph partitioning pipeline).

The goal of this effort is to effectively handle very large no. of nodes/edges per graph partition on each ec2 instance during normal pipeline's processing.

Currently we use "/proc/[pid]/status" to print memory usage of pipeline during the course of its execution. We take a snapshot of the following items:

VmPeak - Peak memory used, so far, during the course of the execution of the pipeline VmSize - Total size of the address space used by the process VmRSS - Size of the memory which is in the RAM at the moment VmData - Size of the Data portion of the current process

IMPORTANT:

Project tracker: https://github.com/orgs/dmlc/projects/2

Description

Depending work items or issues

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale due to lack of activity. It will be closed if no further activity occurs. Thank you