Potential speedups - Githubissues

Circuitscape / Circuitscape.jl

Algorithms from circuit theory to predict connectivity in heterogeneous landscapes

https://circuitscape.org

MIT License

128 stars 35 forks source link

Potential speedups #174

Closed glaroc closed 4 years ago

glaroc commented 5 years ago

Hi, I did some profiling on my Circuitscape runs and identified a couple of bottlenecks.

On Julia 1.0.x, it seems like there is an issue with the readdlm() function. It is MUCH faster on 1.1.0 for opening large .asc files. So, just switching Julia versions decreased the run times by over 30%.

I uncommented this line in out.jl (176) and commented out the loop that followed that was taking a while to run. # s = vec(sum(branch_currents, 1))

And replaced it with s = vec(sum(branch_currents, dims=1))

That also ran much faster.

Thanks for the great work on this! The Julia version is awesome. The only issue for us now is the high RAM consumption which makes it nearly impossible to use on multi-core HPCs. A single run consumes about 150GB of RAM, so running multiple runs in parallel is not really possible.

ViralBShah commented 5 years ago

Why is it using 150GB of RAM? Is that because you have a very large landscape? Once we have multi-threading (in maybe Julia 1.2 or 1.3), it should be easier to use multi-core a lot more effectively. Currently, the multi-processing will balloon RAM usage (it only is useful for smaller grids).

ViralBShah commented 5 years ago

Also @glaroc If you are comfortable, it would be great to get a PR.

glaroc commented 5 years ago

My grid is 13732 x 9827. I'm not sure why it's consuming so much RAM.

I'll do a pull request now.

ViralBShah commented 5 years ago

Thank you. Can you share your .ini file and data files with @ranjanan to look into it?

glaroc commented 5 years ago

Sure I can share the files. Should I send them by email?

ranjanan commented 5 years ago

Just an update: I saved @glaroc's matrix and profiled the preconditioner allocation:

julia> @time smoothed_aggregation(a)
136.883718 seconds (2.01 M allocations: 147.271 GiB, 3.63% gc time)

Most of the allocations however are at this step and this step. I have some ideas on how to reduce their memory footprints. More updates soon!

ViralBShah commented 5 years ago

Are those sparse matrix multiplies?

ranjanan commented 5 years ago

Yes, they are.

On Thu, 7 Feb 2019 at 11:19 PM, Viral B. Shah notifications@github.com wrote:

Are those sparse matrix multiplies?

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/Circuitscape/Circuitscape.jl/issues/174#issuecomment-461685795, or mute the thread https://github.com/notifications/unsubscribe-auth/AIrgQWdLRgLwiYjTNlRlf4yfmv_PR-RVks5vLPrvgaJpZM4ajZp4 .

ViralBShah commented 5 years ago

Is this fixed?

glaroc commented 5 years ago

Any updates on this issue? I'll need to run a bunch of tests soon with the same raster dimensions.

ViralBShah commented 5 years ago

@ranjanan How do you suggest reducing the memory consumption in the sparse matmuls?

ranjanan commented 5 years ago

I haven't gotten a chance to work on this yet, but I'll give a good go this weekend. We just need that Galerkin operator A'*B*A optimised.

ViralBShah commented 5 years ago

Ah yes, that should be easy. @andreasnoack Do we have a general facility to overload these?

andreasnoack commented 5 years ago

A is a matrix, right? In that case, we don't have a function for that in LinearAlgebra. In the vector case, we just introduced three argument dot.

ViralBShah commented 5 years ago

Yes. That's right.

ViralBShah commented 5 years ago

@ranjanan the A'*B*A can't be easily calculated without forming an intermediate product. So, no easy savings there. This problem just looks extremely big for 150GB RAM.

andreasnoack commented 5 years ago

What are the shapes of A and B?

ViralBShah commented 5 years ago

B is square (nxn). A is rectangular (nxm). n>m. The operation is basically coarsening B.

ViralBShah commented 4 years ago

The suggested change is already implemented. Thanks @glaroc

glaroc commented 4 years ago

@ViralBShah I'm curious to know whether there has been any progress on the memory issues discussed in this thread?

ViralBShah commented 4 years ago

The next plan is to use multi-threading and we hope it will use less memory in parallel (for certain solve modes). On the specific issue of sparse matmul discussed above, we don't expect improvements in that operation.

Once we have multi-threaded circuitscape (#197), we should do some memory profiling and see if there are other opportunities. The other possibility is that the multi-threading can help improve the running time of each solve, and thus offering an alternate path to using all the cores.