Closed nicolazilio closed 6 years ago
Open the bigWig file inside extract_data()
.
I tried. In principle that works in the sense that I don't get errors, but the problem is that, with that setup, as I increase the number of processes the computing time also increases.
For small regions that are near each other multithreading won't help you. Once you have 100kb or megabase regions the overhead of reading and decompressing is no longer rate limiting. In general, opening files inside worker forks is the only way to reliably access files in parallel with python.
I have done some more research and it seems that, as you pointed out, increasing the number of processes indeed does not help a lot. However, the biggest reason for the slowdown that I was seeing was actually adding new rows to the pandas data frame 1000s of times. I changed that to writing to file directly and things improved A LOT.
Thanks again for the help again.
Glad you got things resolved!
Hi there,
First of all, I'd like to say that pyBigWig and Deeptools are awesome tools. Thanks a lot for creating them.
I have been trying to parallelize pyBigWig.values() with the multiprocessing library without success. Essentially what I tried to do is something like this
Which works fine if n = 1, but for n > 1, I receive an error saying that there was a problem getting values.
Any ideas as to how to accomplish this?
Thanks a lot in advance