deeptools / pyBigWig

A python extension for quick access to bigWig and bigBed files
MIT License
218 stars 49 forks source link

Is memory released on bw.close() ? #91

Closed PanosFirmpas closed 5 years ago

PanosFirmpas commented 5 years ago

Hi,

I noticed that a script of mine that creates a lot of bw files balloons its memory usage until no more memory can be allocated and the system crashes.

I cannibalized some of my code to make the following tiny test script. It creates a number of small bw files, opening and closing a bw in each iteration

import pyBigWig
import SharedArray as sha
def add_data_to_bw(bw):
    """Read data from the shared memorry and put them in the pybw object."""
    seedstring="a_convoluted_path_in_SharedMem_{}"
    try:
        sha_starts = sha.attach( seedstring.format( "starts") )
        sha_ends = sha.attach( seedstring.format( "ends") )
        sha_values = sha.attach( seedstring.format( "values") )
        mysize = len(sha_starts)

        bw.addEntries(['chrom']*mysize, sha_starts, ends=sha_ends, values=sha_values)
        # sha.delete( seedstring.format( "starts") )
        # sha.delete( seedstring.format( "ends") )
        # sha.delete( seedstring.format( "values") )
    except OSError as e:
        if e.errno == FILENOTFOUNDCODE:
            # do your FileNotFoundError code here
            print("I expected to find this but couldn't: {}".format(seedstring.format( "{}")), file=sys.stderr)
            pass
        else:
            raise
loc = []
with open("chromInfo_mm10.txt", "r") as fi:
    for line in fi:
        c, v = line.rstrip().split("\t")[:2]
        loc.append((c, int(v)))
chrom_size = dict(loc)

for i in range(50):
    bw = pyBigWig.open("./TBD/testito{}.bw".format(i), "w")
    bw.addHeader( sorted(list(chrom_size.items())) , maxZooms=10)
    add_data_to_bw(bw)
    bw.close()

I profiled the memory used with memory_profiler and get the following:
mprotest2

Am I doing something wrong? I would expect the memory to be released every time .close() is called

To further prove my point, I made a script that calls the bw-handling script multiple times (with os.system ). Now the whole process is properly closed every time and the memory profile behaves like I would expect: ![mprotest_alt](https://user-images.githubusercontent.com/3511306/61793654-e45c8a80-ae1f-11e9-884d-70b9185bcb08.png)
dpryan79 commented 5 years ago

Hmm, it's quite possible that I missed a free() somewhere. I'm away at a conference this week, so I'll have a look in the next couple weeks.

dpryan79 commented 5 years ago

FYI, I'm making progress on tracking down the leak, it's happening during the step where the intervals are validated to ensure they're in order. In the interim, if you add the validate=False option when you add the entries then you can avoid the problematic code. I'm hoping to figure out exactly where the problem is and get a fix out in the next day.

dpryan79 commented 5 years ago

Can you try the fix91 branch and see if that solves the issue for you?

PanosFirmpas commented 5 years ago

Yes! The test script looks much better now ! I'll see if the big one can finish without crushing as well, this is great thanks for looking into it !

mprotest_nu

dpryan79 commented 5 years ago

Great, let me know how the larger version does and I'll get this pushed out as a new release.

PanosFirmpas commented 5 years ago

I had a couple unexpected errors, but it turns out I was running out of disk space ! Memory-wise everything seems super smooth now, cheers for the fix!