aaowens / PSID.jl

Quickly assemble data from the Panel Study of Income Dynamics (PSID)
MIT License
25 stars 9 forks source link

Possible memory leak #33

Open eirikbrandsaas opened 4 years ago

eirikbrandsaas commented 4 years ago

On my system (Linux - Pop!_OS 20.04) running makePSID("user_input.json") works fine, but it uses a lot of memory (~10gb). However, for some reason the memory is never released after the function is done running. In fact at one point it actually crashed my computer

The memory usage increase is when the package reads in the data (unzip_data.jl)

datas =  SortedDict(year => readPSID(filename) for (year, filename) in zip(years, filenames))

I dont understand why the memory used in datas isnt released (or whatever the proper terminology is) after the function finishes.

What I have tried to do:

  1. Added GC.gc() after some of the function calls in makePSID. Maybe this helped a little? It seemed to reduce my memeory use to ~80%. So clearly something else is still going on.
    famdatas, inddata = PSID.unzip_data()
    GC.gc()
    println("Constructing data")
    PSID.construct_alldata(famdatas, inddata, codemissings = codemissings)
    GC.gc()