Circuitscape / Circuitscape.jl

Algorithms from circuit theory to predict connectivity in heterogeneous landscapes
https://circuitscape.org
MIT License
128 stars 35 forks source link

Run finishes before all pairs are solved in pairwise mode #109

Closed meganjennings closed 6 years ago

meganjennings commented 6 years ago

I have been trying out the new release of Circuitscape in Julia and am having trouble getting the runs to complete. Perhaps I'm missing something?

My graph has 2,013,771 nodes, 200 focal points and 13 connected components (.ini file pasted below). Everything starts just fine on solving the 19,900 pairs, but each time, it stops when it gets to 199 out of 19,900 pairs. There is no error message, only a completion message after the 199th solve:

Solving pair 199 of 19900 [2018-04-18T16:57:49 | info | root]: Time taken to solve linear system = 10.036304987 seconds [2018-04-18T16:57:50 | info | root]: Time taken to complete job = 2128.810110815

Thank you for all your work to upgrade Circuitscape!

[Options for advanced mode] ground_file_is_resistances = True remove_src_or_gnd = keepall ground_file = (Browse for a ground point file) use_unit_currents = False source_file = (Browse for a current source file) use_direct_grounds = False

[Mask file] mask_file = (Browse for a raster mask file) use_mask = False

[Calculation options] low_memory_mode = False parallelize = True solver = cg+amg print_timings = 1 preemptive_memory_release = False print_rusages = False max_parallel = 3

[Short circuit regions (aka polygons)] polygon_file = (Browse for a short-circuit region file) use_polygons = False

[Options for one-to-all and all-to-one modes] use_variable_source_strengths = False variable_source_file = (Browse for a source strength file)

[Output options] set_null_currents_to_nodata = False set_focal_node_currents_to_zero = False set_null_voltages_to_nodata = False compress_grids = False write_cur_maps = 1 write_volt_maps = False output_file = D:\LCC_runs\wrentit\wrentit_run2.out write_cum_cur_map_only = True log_transform_maps = False write_max_cur_maps = False

[Version] version = 4.0.5

[Options for reclassification of habitat data] reclass_file = (Browse for file with reclassification data) use_reclass_table = False

[Logging Options] log_level = INFO log_file = D:\LCC_runs\wrentit\wrentit_run2.log profiler_log_file = D:\LCC_runs\wrentit\wrentit_run2_rusages.log screenprint_log = False

[Options for pairwise and one-to-all and all-to-one modes] included_pairs_file = (Browse for a file with pairs to include or exclude) use_included_pairs = False point_file = D:\LCC_runs\wrentit\CS_inputs\wrentit_200_pts_nad83albers_2.asc

[Connection scheme for raster habitat data] connect_using_avg_resistances = True connect_four_neighbors_only = False

[Habitat raster or graph] habitat_map_is_resistances = True habitat_file = D:\LCC_runs\wrentit\CS_inputs\chfa_res_270_2.asc

[Circuitscape mode] data_type = raster scenario = pairwise

ranjanan commented 6 years ago

Hi @meganjennings, thank you for using the new upgraded Circuitscape!

So what seems to have happened is Circuitscape entered the shortcut resistance calculation mode here by mistake. Circuitscape usually enters this mode when write_cur_maps and write_volt_maps are both set to false. This mode is used by folks who are only interested in the resistance values. My guess is: in your case, the package seems to have taken 1 to be false which shouldn't happen. #112 should fix that.

Ideally, you should have had a log message telling you that it entered shortcut mode but that didn't happen, which is why this output is confusing. #111 fixes this so logging in this shortcut path is better.

Once both #111 and #112 are merged, could you try the following to see if it works?

In your Julia terminal paste:

Pkg.checkout("Circuitscape")

This will put you on the latest master version of Circuitscape (with the fixes). Once you do that, could you restart Julia, try again and see if it runs as you expect?

meganjennings commented 6 years ago

Hi @ranjanan. Thanks for the quick reply! I tried updating the package and running again and it stopped again at 199 out of 19900 pair solves. However, I'm not sure Circuitscape actually updated. I gave me the following:

INFO: Checking out Circuitscape master...
INFO: Pulling Circuitscape latest master...
INFO: No packages to install, update or remove

I'm new to Julia, but the last line has me thinking the update with the new fixes didn't happen.

ranjanan commented 6 years ago

Did you restart Julia? Restarting Julia and loading the module again via using Circuitscape allows Julia to load the new (fixed) version of Circuitscape. Apologies if you already have, I'm just checking.

ranjanan commented 6 years ago

Also, on a completely separate note, I noticed your problem is ~2M in size. You could also consider using the new CHOLMOD solver mode by changing this line in your INI file to:

solver = cholmod

This mode should give you the answer much faster.

Perhaps once you see if the fixes work for you, you could try this new mode, if you are interested.

meganjennings commented 6 years ago

Thanks. Restarting Julia worked and I got it running! A few questions: Does this version write a cumulative current raster like original Circuitscape? I can't find it. Is there an option to run without producing all the pair current maps as in the original Circuitscape? With such a large problem, I need ~500GB available just to get through a run.

Also, I have tried to run the CHOLMOD solver mode and would love to compare outputs but I get a memory error:

[2018-04-22T23:00:36 | info | root]: Precision used: Double [2018-04-22T23:00:36 | info | root]: Starting up Circuitscape to use 3 processes in parallel [2018-04-22T23:00:49 | info | root]: Reading maps [2018-04-22T23:00:54 | info | root]: Resistance/Conductance map has 2013771 nodes [2018-04-22T23:00:59 | info | root]: Solver used: CHOLMOD [2018-04-22T23:00:59 | info | root]: Graph has 2013771 nodes, 200 focal points and 13 connected components [2018-04-22T23:01:00 | info | root]: Total number of pair solves = 19900 [2018-04-22T23:01:13 | info | root]: Time taken to construct cholesky factor = 12.975029729 [2018-04-22T23:01:14 | info | root]: Time taken to construct local nodemap = 0.47744158 seconds

ERROR: OutOfMemoryError() Stacktrace: [1] Array{Float64,N} where N(::Tuple{Int64,Int64}) at .\boot.jl:317 [2] _cholmod_solver_path(::Circuitscape.GraphData{Float64,Int32}, ::Circuitscape.RasterFlags, ::Dict{String,String}) at C:\Users\mjennings.julia\v0.6\Circuitscape\src\core.jl:372 [3] single_ground_all_pairs(::Circuitscape.GraphData{Float64,Int32}, ::Circuitscape.RasterFlags, ::Dict{String,String}) at C:\Users\mjennings.julia\v0.6\Circuitscape\src\core.jl:52 [4] _pt_file_no_polygons_path(::Circuitscape.RasData{Float64,Int32}, ::Circuitscape.RasterFlags, ::Dict{String,String}) at C:\Users\mjennings.julia\v0.6\Circuitscape\src\raster\pairwise.jl:61 [5] raster_pairwise(::Type{T} where T, ::Dict{String,String}) at C:\Users\mjennings.julia\v0.6\Circuitscape\src\raster\pairwise.jl:29 [6] _compute(::Type{T} where T, ::Dict{String,String}) at C:\Users\mjennings.julia\v0.6\Circuitscape\src\run.jl:37 [7] macro expansion at .\util.jl:293 [inlined] [8] compute(::String) at C:\Users\mjennings.julia\v0.6\Circuitscape\src\run.jl:26

ranjanan commented 6 years ago

@meganjennings sorry, but I didn't include that yet. I was debating as to whether users would want to use it given they already have all the pairwise current maps. Clearly, I am wrong. Could you give me a day or two to implement and fix it? I apologise for the delay. I guess I needed this understanding of a user's workflow that you are so generously providing.

As for the CHOLMOD error, I think I know where it is. Could you tell me how much memory (RAM) your computer has?

meganjennings commented 6 years ago

@ranjanan no problem! I'm really excited to have something in the works that is so much more efficient and I'm glad to hear my experiences are helpful to your programming process. Even in CG+AMG solver mode, this problem takes only ~35 hours in Julia compared to 8 days with original Circuitscape.

The cumulative current maps are the primary output that I use (and I think this is the case for many other Circuitscape users), so if you're able to make that available, that would be great.

I tested the CHOLMOD solver mode on two computers. The one I used that generated the error I posted has only 8GB of RAM but my other has 24GB. I believe the error was the same on both computers but I can double check if that's helpful to you.

ranjanan commented 6 years ago

@meganjennings I'm really happy to see you're getting good speedups. If you already have all the pairwise current maps, I can perhaps give you a utility now that can read all of them and then generates the cumulative current maps, if you're in a hurry and don't want to run everything again. Let me know if you're interested in that utility. I do realise though, that writing only cumulative current maps (and not current maps at all) is important too, so I'll work on including that by default as well.

As for the CHOLMOD mode, the problem is I'm internally creating a really large array that's going to blow up memory. I never realised users would want to solve ~20k pairs (another way where your work is helping!) and kind of erroneously assumed that users would solve at max 200 pairs. I think solving in batches would help there, and I'll push that fix soon too.

Thank you once more for all your help and understanding!

meganjennings commented 6 years ago

@ranjanan The utility to read the pairwise current maps and generate a cumulative current raster would be helpful in the interim while you implement the necessary fixes to make that a default output. Thanks for offering that up. And I really appreciate your willingness to jump on these fixes so quickly.

I look forward to trying out the CHOLMOD mode whenever it's ready. The problem I'm working on now is by far the largest I've created, but none of the past problems I've run have been as small as 200. I have previously coarsened grid cell size to reduce overall problem size, but primarily only as much as needed so I can include as many source points as I think are necessary. I generally only adjust downward to smaller problems to keep the solve time to a 7-8 day maximum.

ranjanan commented 6 years ago

@meganjennings #115 adds the utility to read all the current maps of a particular run and accumulate them. To use:

using Circuitscape
calculate_cum_current_map("path/to/output/file")

In your case, this might be:

calculate_cum_current_map("D:\LCC_runs\wrentit\wrentit_run2.out")

If you'd like to calculate max current maps do:

calculate_max_current_map("D:\LCC_runs\wrentit\wrentit_run2.out")
ranjanan commented 6 years ago

Oh and you'd have to do the same thing as earlier:

Pkg.checkout("Circuitscape")

and restart Julia and try.

ranjanan commented 6 years ago

@meganjennings could I ask you if the utility helped? I'm making good progress on #119, but it took me longer than I expected.

meganjennings commented 6 years ago

Yes @ranjanan! Sorry, I got busy with other things and forgot to report back. The utility worked exactly as I needed it to, thank you. It did take an additional ~18 hrs to process though, so the fix you're working on now to have cumulative (or max) current maps be a default output will be very helpful.

ranjanan commented 6 years ago

Glad to hear it worked, but not happy about additional 18 hrs. My work on including cumulative maps by default is on a branch (par if you'd like to check it out). It seems to work fine, and performance looks good on the raster pairwise, but I'm just fixing up cumulative currents on the network mode.

this problem takes only ~35 hours in Julia compared to 8 days with original Circuitscape

@meganjennings would you mind if I reached out to you via email so I could obtain a quote from you on your problem? User benchmarks would add to credibility of the new version. Plus I'd mention your work everywhere I speak about the new version! 😄 Would you happen to be interested in that?

meganjennings commented 6 years ago

@ranjanan sure thing! My email is public on my GitHub account. Feel free to drop me a line.

I have additional problems to run (all the same size) and can give those a try once the fix is ready. I'm curious to see how the total run time compares with the cumulative map output integrated into the original run.

ranjanan commented 6 years ago

@meganjennings I just got done with with #123 and #126 which introduce cumulative maps and cholmod batching. You should be able to just write cumulative maps using the write_cum_cur_map_only = true option and the cholmod mode now supports batching which should make your workload run in the cholmod mode. As usual, do Pkg.checkout("Circuitscape") and then restart Julia and try. Thank you!

meganjennings commented 6 years ago

@ranjanan I finally got around to trying my run using solver=cholmod but am getting a new memory error. My .ini file is the same as before, only with the solver changed, and I get the output messages below.

compute("K:/Modeling/Connectivity/Wrentit/wrentit_run_29May2018.ini") [2018-05-29T09:47:02 | info | root]: Precision used: Double [2018-05-29T09:47:02 | info | root]: Starting up Circuitscape to use 3 processes in parallel [2018-05-29T09:47:15 | info | root]: Reading maps [2018-05-29T09:47:23 | info | root]: Resistance/Conductance map has 2013771 nodes [2018-05-29T09:47:32 | info | root]: Solver used: CHOLMOD [2018-05-29T09:47:32 | info | root]: Graph has 2013771 nodes, 200 focal points and 13 connected components [2018-05-29T09:47:33 | info | root]: Total number of pair solves = 19900 [2018-05-29T09:47:57 | info | root]: Time taken to construct cholesky factor = 23.249864448 [2018-05-29T09:47:57 | info | root]: Time taken to construct local nodemap = 0.682627123 seconds [2018-05-29T09:48:15 | info | root]: Solving points 1 to 1000 Worker 7 terminated. Worker 8 terminated.ERROR (unhandled task failure): read: connection reset by peer (ECONNRESET)

ERROR (unhandled task failure): read: connection reset by peer (ECONNRESET) ERROR: On worker 2: OutOfMemoryError() Stacktrace: [1] #573 at .\asyncmap.jl:178 [inlined] [2] foreach(::Base.##573#575, ::Array{Any,1}) at .\abstractarray.jl:1733 [3] maptwice(::Function, ::Channel{Any}, ::Array{Any,1}, ::UnitRange{Int64}, ::Vararg{UnitRange{Int64},N} where N) at .\asyncmap.jl:178 [4] wrap_n_exec_twice(::Channel{Any}, ::Array{Any,1}, ::Base.Distributed.##204#207{WorkerPool}, ::Function, ::UnitRange{Int64}, ::Vararg{UnitRange{Int64},N} where N) at .\asyncmap.jl:154 [5] #async_usemap#558(::Function, ::Void, ::Function, ::Base.Distributed.##188#190, ::UnitRange{Int64}, ::Vararg{UnitRange{Int64},N} where N) at .\asyncmap.jl:103 [6] (::Base.#kw##async_usemap)(::Array{Any,1}, ::Base.#async_usemap, ::Function, ::UnitRange{Int64}, ::Vararg{UnitRange{Int64},N} where N) at .\:0 [7] (::Base.#kw##asyncmap)(::Array{Any,1}, ::Base.#asyncmap, ::Function, ::UnitRange{Int64}) at .\:0 [8] #pmap#203(::Bool, ::Int64, ::Void, ::Array{Any,1}, ::Void, ::Function, ::WorkerPool, ::Function, ::UnitRange{Int64}) at .\distributed\pmap.jl:126 [9] pmap(::WorkerPool, ::Function, ::UnitRange{Int64}) at .\distributed\pmap.jl:101 [10] #pmap#213(::Array{Any,1}, ::Function, ::Function, ::UnitRange{Int64}) at .\distributed\pmap.jl:156 [11] _cholmod_solver_path(::Circuitscape.GraphData{Float64,Int32}, ::Circuitscape.RasterFlags, ::Dict{String,String}, ::Bool, ::Int64) at C:\Users\Megan.julia\v0.6\Circuitscape\src\core.jl:419 [12] single_ground_all_pairs(::Circuitscape.GraphData{Float64,Int32}, ::Circuitscape.RasterFlags, ::Dict{String,String}, ::Bool) at C:\Users\Megan.julia\v0.6\Circuitscape\src\core.jl:62 [13] _pt_file_no_polygons_path(::Circuitscape.RasData{Float64,Int32}, ::Circuitscape.RasterFlags, ::Dict{String,String}) at C:\Users\Megan.julia\v0.6\Circuitscape\src\raster\pairwise.jl:61 [14] raster_pairwise(::Type{T} where T, ::Dict{String,String}) at C:\Users\Megan.julia\v0.6\Circuitscape\src\raster\pairwise.jl:29 [15] _compute(::Type{T} where T, ::Dict{String,String}) at C:\Users\Megan.julia\v0.6\Circuitscape\src\run.jl:37 [16] macro expansion at .\util.jl:293 [inlined] [17] compute(::String) at C:\Users\Megan.julia\v0.6\Circuitscape\src\run.jl:26

ViralBShah commented 6 years ago

How big is your matrix? Looks like it is too big for a cholesky factorization, which is why the out of memory error.

ViralBShah commented 6 years ago

@ranjanan We should catch this exception and give a meaningful error to the user.

ViralBShah commented 6 years ago

@meganjennings Do you have access to a bigger computer with more RAM? The other thing to do is to disable the parallelization which will reduce memory usage.

meganjennings commented 6 years ago

@ViralBShah Thanks for the quick responses. The problem has ~2 million nodes and 19,900 pair solves. This computer has 24GB of RAM and that's the most I have access to at the moment. I'm trying a run without the parallelization right now and so far so good. It's still running and has already gotten past the point where it failed previously.

meganjennings commented 6 years ago

@ViralBShah and @ranjanan I tried a new solver=cholmod run 2 days ago with no paralellization and it started but then never seemed to progress. When I just went to copy the output to paste here, it finally pinged another memory error. Any other thoughts on what's going on or is this still just a straight up matrix size and memory limitation issue?

julia> compute("K:/Modeling/Connectivity/Wrentit/wrentit_run_29May2018.ini") [2018-05-30T10:34:26 | info | root]: Precision used: Double [2018-05-30T10:34:26 | info | root]: Reading maps [2018-05-30T10:34:43 | info | root]: Resistance/Conductance map has 2013771 nodes [2018-05-30T10:34:51 | info | root]: Solver used: CHOLMOD [2018-05-30T10:34:51 | info | root]: Graph has 2013771 nodes, 200 focal points and 13 connected components [2018-05-30T10:34:51 | info | root]: Total number of pair solves = 19900 [2018-05-30T10:35:23 | info | root]: Time taken to construct cholesky factor = 25.528565257 [2018-05-30T10:35:24 | info | root]: Time taken to construct local nodemap = 0.791731687 seconds [2018-05-30T10:35:40 | info | root]: Solving points 1 to 1000 Worker 5 terminated. ERROR (unhandled task failure): read: connection reset by peer (ECONNRESET) ERROR: On worker 6: OutOfMemoryError() Stacktrace: [1] #573 at .\asyncmap.jl:178 [inlined] [2] foreach(::Base.##573#575, ::Array{Any,1}) at .\abstractarray.jl:1733 [3] maptwice(::Function, ::Channel{Any}, ::Array{Any,1}, ::UnitRange{Int64}, ::Vararg{UnitRange{Int64},N} where N) at .\asyncmap.jl:178 [4] wrap_n_exec_twice(::Channel{Any}, ::Array{Any,1}, ::Base.Distributed.##204#207{WorkerPool}, ::Function, ::UnitRange{Int64}, ::Vararg{UnitRange{Int64},N} where N) at .\asyncmap.jl:154 [5] #async_usemap#558(::Function, ::Void, ::Function, ::Base.Distributed.##188#190, ::UnitRange{Int64}, ::Vararg{UnitRange{Int64},N} where N) at .\asyncmap.jl:103 [6] (::Base.#kw##async_usemap)(::Array{Any,1}, ::Base.#async_usemap, ::Function, ::UnitRange{Int64}, ::Vararg{UnitRange{Int64},N} where N) at .\:0 [7] (::Base.#kw##asyncmap)(::Array{Any,1}, ::Base.#asyncmap, ::Function, ::UnitRange{Int64}) at .\:0 [8] #pmap#203(::Bool, ::Int64, ::Void, ::Array{Any,1}, ::Void, ::Function, ::WorkerPool, ::Function, ::UnitRange{Int64}) at .\distributed\pmap.jl:126 [9] pmap(::WorkerPool, ::Function, ::UnitRange{Int64}) at .\distributed\pmap.jl:101 [10] #pmap#213(::Array{Any,1}, ::Function, ::Function, ::UnitRange{Int64}) at .\distributed\pmap.jl:156 [11] _cholmod_solver_path(::Circuitscape.GraphData{Float64,Int32}, ::Circuitscape.RasterFlags, ::Dict{String,String}, ::Bool, ::Int64) at C:\Users\Megan.julia\v0.6\Circuitscape\src\core.jl:419 [12] single_ground_all_pairs(::Circuitscape.GraphData{Float64,Int32}, ::Circuitscape.RasterFlags, ::Dict{String,String}, ::Bool) at C:\Users\Megan.julia\v0.6\Circuitscape\src\core.jl:62 [13] _pt_file_no_polygons_path(::Circuitscape.RasData{Float64,Int32}, ::Circuitscape.RasterFlags, ::Dict{String,String}) at C:\Users\Megan.julia\v0.6\Circuitscape\src\raster\pairwise.jl:61 [14] raster_pairwise(::Type{T} where T, ::Dict{String,String}) at C:\Users\Megan.julia\v0.6\Circuitscape\src\raster\pairwise.jl:29 [15] _compute(::Type{T} where T, ::Dict{String,String}) at C:\Users\Megan.julia\v0.6\Circuitscape\src\run.jl:37 [16] macro expansion at .\util.jl:293 [inlined] [17] compute(::String) at C:\Users\Megan.julia\v0.6\Circuitscape\src\run.jl:26

ViralBShah commented 6 years ago

Cholesky solvers take up a lot of memory. Can we try this with the AMG code? Can you share your data files and ini file with us to experiment ourselves?

meganjennings commented 6 years ago

@ViralBShah Yes, I have already run this in AMG mode, I was just giving CHOLMOD mode a try because @ranjanan asked if I would give it a go with this large problem. I'm happy to send my input files if you want to try it out to see what it takes to get CHOLMOD to work with something of this size. What's the best way to transfer?

ranjanan commented 6 years ago

@meganjennings how about you upload to either dropbox or Google Drive and comment (or email) the link?

meganjennings commented 6 years ago

@ranjanan Input rasters and ini file are now up on Dropbox: https://www.dropbox.com/sh/cvk3zgxchnq00p0/AABYWDYK4HQqU7xedZqFN4Ama?dl=0

ranjanan commented 6 years ago

@meganjennings would you mind adding one particular line to your INI file and trying again? Add:

cholmod_batch_size = 100

at the end of your INI file.

meganjennings commented 6 years ago

@ranjanan I just got it up and running and so far, so good! It seems to be working and is cranking through current maps. I'll let you know when it finishes if it all turned out OK.

ranjanan commented 6 years ago

Great to hear! Do let me know. Also, I was hoping if you could send me the results of your benchmarking, whenever they're done, if you consent. Would greatly help me while I'm preparing my presentation on the Julia package. Thank you!

ViralBShah commented 6 years ago

In general, we should have a default batch size and not expect users to set it.

meganjennings commented 6 years ago

@ranjanan After a hiccup with a forced Windows update and restart, the CHOLMOD run finished and looks great. Indistinguishable from the CG+AMG run as far as I can tell. It took 96996 seconds according to Julia. I lost track of which runs I did on which computer, so I'm re-running the CG+AMG run with parallel processing right now for comparison. I'll need to re-do the full original Circuitscape run for comparability, but it was taking at least 8 days to run that way, so this is a full order of magnitude faster!! I'll set up the original CS run before I head to Toronto so it will be finished by the time I get back. FYI - I have this running on my older desktop, and I'm finding it's a bit slower than my laptop. I have more RAM on the desktop but the processor is older/slower. Let me know if you need my computer specs for reference.

meganjennings commented 6 years ago

@ranjanan 259716 seconds to finish in CG+AMG mode using parallel processing

ranjanan commented 6 years ago

@meganjennings thank you for all your help testing and using this. I am closing this particular thread but please feel free to open another issue if you encounter any other issue