Better understanding on the usage of memory (allocation/deallocation) of a python process - distributed graph partitioning pipeline

🔨Work Item

This issue will be used to track the discussion on memory related issues of a python process (in the context of distributed graph partitioning pipeline).

The goal of this effort is to effectively handle very large no. of nodes/edges per graph partition on each ec2 instance during normal pipeline's processing.

Currently we use "/proc/[pid]/status" to print memory usage of pipeline during the course of its execution. We take a snapshot of the following items:

VmPeak - Peak memory used, so far, during the course of the execution of the pipeline VmSize - Total size of the address space used by the process VmRSS - Size of the memory which is in the RAM at the moment VmData - Size of the Data portion of the current process

IMPORTANT:

This template is only for dev team to track project progress. For feature request or bug report, please use the corresponding issue templates.
DO NOT create a new work item if the purpose is to fix an existing issue or feature request. We will directly use the issue in the project tracker.

Project tracker: https://github.com/orgs/dmlc/projects/2

dmlc / dgl

Better understanding on the usage of memory (allocation/deallocation) of a python process - distributed graph partitioning pipeline #4784

🔨Work Item

Description

Depending work items or issues