aidenlab / Juicebox

Visualization and analysis software for Hi-C data -
https://aidenlab.org/juicebox
MIT License
240 stars 58 forks source link

Slowdown when loading 1D and 2D tracks #162

Closed sanjitsbatra closed 9 years ago

sanjitsbatra commented 9 years ago

For the purpose of assembly, we sometimes load upto 1M 2D annotations on a single chromosome, and even more dense 1D tracks. Doing so makes juicebox really slow. It's not a problem of the RAM, because even with 30GB RAM, the process only takes up 6GB RAM. It's a problem with the processor, in that, at each moment, juicebox is trying to search through many annotations. Further, if we just uncheck the show annotations box, then it returns to its original speed. This clearly suggests a need for a better handling(perhaps sparse) of the annotations.

nchernia commented 9 years ago

I don't see how we could do sparse 2D annotations. 1M per chromosome is an awful lot

On Thu, Aug 6, 2015 at 1:38 PM, sanjitsbatra notifications@github.com wrote:

For the purpose of assembly, we sometimes load upto 1M 2D annotations on a single chromosome, and even more dense 1D tracks. Doing so makes juicebox really slow. It's not a problem of the RAM, because even with 30GB RAM, the process only takes up 6GB RAM. It's a problem with the processor, in that, at each moment, juicebox is trying to search through many annotations. Further, if we just uncheck the show annotations box, then it returns to its original speed. This clearly suggests a need for a better handling(perhaps sparse) of the annotations.

— Reply to this email directly or view it on GitHub https://github.com/theaidenlab/Juicebox/issues/162.

Neva Cherniavsky Durand, Ph.D. Staff Scientist, Aiden Lab www.aidenlab.org

nchernia commented 9 years ago

Jim notes the following, which I think is true:

I don't want to reply to github because I'm not sure, but I think JuiceBox supports BigBed files, as well as tabix indexed files, for the 1D tracks. Assuming that's correct I would recommend using one of those formats for dense annotations, preferably BigBed. That should solve the 1D problem, not sure about 2D.

On Thu, Aug 6, 2015 at 1:45 PM, Neva Cherniavsky Durand < neva@broadinstitute.org> wrote:

I don't see how we could do sparse 2D annotations. 1M per chromosome is an awful lot

On Thu, Aug 6, 2015 at 1:38 PM, sanjitsbatra notifications@github.com wrote:

For the purpose of assembly, we sometimes load upto 1M 2D annotations on a single chromosome, and even more dense 1D tracks. Doing so makes juicebox really slow. It's not a problem of the RAM, because even with 30GB RAM, the process only takes up 6GB RAM. It's a problem with the processor, in that, at each moment, juicebox is trying to search through many annotations. Further, if we just uncheck the show annotations box, then it returns to its original speed. This clearly suggests a need for a better handling(perhaps sparse) of the annotations.

— Reply to this email directly or view it on GitHub https://github.com/theaidenlab/Juicebox/issues/162.

Neva Cherniavsky Durand, Ph.D. Staff Scientist, Aiden Lab www.aidenlab.org

Neva Cherniavsky Durand, Ph.D. Staff Scientist, Aiden Lab www.aidenlab.org

sanjitsbatra commented 9 years ago

I see. At each moment, when a 2D track is loaded, what is juicebox trying to search for? Maybe we can modify that slightly, and make that computationally less heavy? I don't quite know how this works, so maybe this isn't possible.

nchernia commented 9 years ago

It reads the file and loads all the 2D annotations. There's no search.

1M annotations per chromosome seems to me to be outside the original scope of the 2D annotations tool.

On Thu, Aug 6, 2015 at 1:49 PM, sanjitsbatra notifications@github.com wrote:

I see. At each moment, when a 2D track is loaded, what is juicebox trying to search for? Maybe we can modify that slightly, and make that computationally less heavy? I don't quite know how this works, so maybe this isn't possible.

— Reply to this email directly or view it on GitHub https://github.com/theaidenlab/Juicebox/issues/162#issuecomment-128457323 .

Neva Cherniavsky Durand, Ph.D. Staff Scientist, Aiden Lab www.aidenlab.org

sanjitsbatra commented 9 years ago

I see. But I wanted to point out one curious thing.

With the tracks loaded, if we simply uncheck the show 2D annotations, the RAM assigned to juicebox does not change. But the speed improves and it gets back to the speed, as if the tracks weren't loaded. Why would this be happening? Is this something you would have expected?

nchernia commented 9 years ago

Probably the slow down is due to drawing all the 2D annotations. We could look at the code and see if it could be more efficient. But I suspect you need a different way to visualize whatever you're looking at.

On Thu, Aug 6, 2015 at 1:58 PM, sanjitsbatra notifications@github.com wrote:

I see. But I wanted to point out one curious thing.

With the tracks loaded, if we simply uncheck the show 2D annotations, the RAM assigned to juicebox does not change. But the speed improves and it gets back to the speed, as if the tracks weren't loaded. Why would this be happening? Is this something you would have expected?

— Reply to this email directly or view it on GitHub https://github.com/theaidenlab/Juicebox/issues/162#issuecomment-128459174 .

Neva Cherniavsky Durand, Ph.D. Staff Scientist, Aiden Lab www.aidenlab.org

sa501428 commented 9 years ago

I'm pretty sure we can get at least some speed up in regards to 2d annotations. We could do spatial clustering of the 2d annotations, and thus only render nearby components. More critically - I believe the main slow down is occurring because juicebox keeps track of which features it highlights/selects (relative to the mouse). Instead of iterating through all visible loops, we should only search through those nearest to the mouse click. I think this would give us massive speedup.

sanjitsbatra commented 9 years ago

Brilliant. Thank you so much Muhammad! This is great! On Aug 8, 2015 3:39 PM, "Muhammad Saad Shamim" notifications@github.com wrote:

Closed #162 https://github.com/theaidenlab/Juicebox/issues/162 via 557a668 https://github.com/theaidenlab/Juicebox/commit/557a668f6c431e3e66d5e2c9227136ba2bf5c461 .

— Reply to this email directly or view it on GitHub https://github.com/theaidenlab/Juicebox/issues/162#event-377132376.