NREL / gdx-pandas

Python interface to read and write GAMS GDX files using pandas.DataFrames as the intermediate data format.
BSD 3-Clause "New" or "Revised" License
43 stars 16 forks source link

Memory leak on read #78

Closed jebob closed 3 years ago

jebob commented 3 years ago

Minimum example, increases in RAM usage by about 100KB/sec on my machine

import gdxpds
import pandas as pd

SIZE = 10

def make_big():
    y = [("a", i) for i in range(SIZE)]
    x = pd.DataFrame(y, columns=["i", "value"])
    gdxpds.to_gdx({"x": x}, "big.gdx")

def repeated_read():
    while True:
        x = gdxpds.to_dataframe("big.gdx", "x")

make_big()
repeated_read()

I looked into this using objgraph and found that atexit register was storing cleanup, which required storing the GDX object and associated symbols.

import gdxpds
import pandas as pd
import objgraph

SIZE = 10

def make_big():
    y = [("a", i) for i in range(SIZE)]
    x = pd.DataFrame(y, columns=["i", "value"])
    gdxpds.to_gdx({"x": x}, "big.gdx")

def repeated_read():
    x = gdxpds.to_dataframe("big.gdx", "x")
    objgraph.show_growth(limit=20)
    for _ in range(10):
        x = gdxpds.to_dataframe("big.gdx", "x")
    objgraph.show_growth(limit=20)
    obj = objgraph.by_type("GdxFile")[10]
    objgraph.show_backrefs(obj, max_depth=10)

make_big()
repeated_read()

Producing a graph showing why a particular python object (here GdxFile) had been saved.

image

Commenting out atexit.register in Gdx prevented the memory leak in the first case.