deeptools / pyBigWig

A python extension for quick access to bigWig and bigBed files
MIT License
212 stars 48 forks source link

Writing a bigwig file in python3 is extremely slow compared to python2 #90

Closed Young-Sook closed 3 years ago

Young-Sook commented 5 years ago

In my script, one of the functions (mergeFiles) is to read a lot of bedgraph files and merge them into a single bigwig file, using pyBigWig. I ran exactly the same script in python 2.7 and python 3.7. It only took less than 1 hour for python 2.7, but took > 6 days for python 3.7.

Also, 15GB was enough to the function in python 2.7, but in python 3.7, the script crashes when I allocated even 250GB . When I removed the part of writing to bigwig files, I don't have any running time and memory problem.

Is there any reason why the performance of pyBigWig is so different in the two versions of python?

This is my function:

def mergeFiles(fileNames):
        bw = pyBigWig.open("new.bw", "w")
        bw.addHeader(bwHeader)
        for i in range(len(fileNames)):
                tempSignalBedName = fileNames[i]
                tempFile_stream = open(tempSignalBedName)
                tempFile = tempFile_stream.readlines()

                for j in range(len(tempFile)):
                        regionChromo = temp[0]
                        regionStart = int(temp[1])
                        regionEnd = int(temp[2])
                        regionValue = float(temp[3])

                        bw.addEntries([regionChromo], [regionStart], ends=[regionEnd], values=[regionValue])

                tempFile_stream.close()
        bw.close()

A side note: I am using multiprocessing for the mergeFiles function.

pool = multiprocessing.Pool(numSamples)
result = pool.map_async(mergeFiles, joblist).get()
pool.close()
dpryan79 commented 5 years ago

The memory issue was probably the same as that from #91, which should now be fixed (I'm pushing out a new version now). Can you try this again with version 0.3.17?