Circuitscape / Circuitscape.jl

Algorithms from circuit theory to predict connectivity in heterogeneous landscapes
https://circuitscape.org
MIT License
128 stars 35 forks source link

Make 64-bit indexing default. Was ERROR: LoadError: InexactError: trunc(Int32, 2147483653) #200

Closed frederikvand closed 4 years ago

frederikvand commented 4 years ago

I am running circuitscape 5.5.4 on linux with julia. There are 35 parallel processes with 20 gb per node (700gb ram). The habitat resistance rasters are Int16. After 1.5 hours I get an error related to string conversion. There are more nodes than the integers in Int32. Could that be the cause?

/buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.2/Pkg/src/Types.jl:1171 [ Info: 2019-11-20 14:28:21 : Logs will recorded to file: log_file [ Info: 2019-11-20 14:28:21 : Precision used: Double [ Info: 2019-11-20 14:28:21 : Starting up Circuitscape to use 35 processes in parallel [ Info: 2019-11-20 14:28:33 : Reading maps [ Info: 2019-11-20 15:12:11 : Resistance/Conductance map has 491325325 nodes ERROR: LoadError: InexactError: trunc(Int32, 2147483653) Stacktrace: [1] throw_inexacterror(::Symbol, ::Type{Int32}, ::Int64) at ./boot.jl:560 [2] checked_trunc_sint at ./boot.jl:582 [inlined] [3] toInt32 at ./boot.jl:619 [inlined] [4] Type at ./boot.jl:709 [inlined] [5] convert at ./number.jl:7 [inlined] [6] setindex! at ./array.jl:766 [inlined] [7] setcolptr! at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.2/SparseArrays/src/higherorderfns.jl:128 [inlined] [8] _map_zeropres!(::typeof(+), ::SparseArrays.SparseMatrixCSC{Float64,Int32}, ::SparseArrays.SparseMatrixCSC{Float64,Int32}, ::SparseArrays.SparseMatrixCSC{Float64,Int32}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.2/SparseArrays/src/higherorderfns.jl:304 [9] _noshapecheck_map(::typeof(+), ::SparseArrays.SparseMatrixCSC{Float64,Int32}, ::SparseArrays.SparseMatrixCSC{Float64,Int32}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.2/SparseArrays/src/higherorderfns.jl:165 [10] _shapecheckbc at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.2/SparseArrays/src/higherorderfns.jl:1025 [inlined] [11] _copy at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.2/SparseArrays/src/higherorderfns.jl:1015 [inlined] [12] copy at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.2/SparseArrays/src/higherorderfns.jl:1131 [inlined] [13] materialize at ./broadcast.jl:798 [inlined] [14] broadcast at ./broadcast.jl:752 [inlined] [15] +(::SparseArrays.SparseMatrixCSC{Float64,Int32}, ::LinearAlgebra.Adjoint{Float64,SparseArrays.SparseMatrixCSC{Float64,Int32}}) at ./arraymath.jl:39 [16] construct_graph(::Array{Float64,2}, ::Array{Int32,2}, ::Bool, ::Bool) at /user/leuven/330/vsc33060/.julia/packages/Circuitscape/mKAhh/src/raster/pairwise.jl:389 [17] compute_graph_data_no_polygons(::Circuitscape.RasData{Float64,Int32}, ::Circuitscape.RasterFlags) at /user/leuven/330/vsc33060/.julia/packages/Circuitscape/mKAhh/src/raster/pairwise.jl:207 [18] _pt_file_no_polygons_path(::Circuitscape.RasData{Float64,Int32}, ::Circuitscape.RasterFlags, ::Dict{String,String}) at /user/leuven/330/vsc33060/.julia/packages/Circuitscape/mKAhh/src/raster/pairwise.jl:60 [19] raster_pairwise(::Type, ::Type, ::Dict{String,String}) at /user/leuven/330/vsc33060/.julia/packages/Circuitscape/mKAhh/src/raster/pairwise.jl:29 [20] _compute(::Type, ::Type, ::Dict{String,String}) at /user/leuven/330/vsc33060/.julia/packages/Circuitscape/mKAhh/src/run.jl:42 [21] macro expansion at ./util.jl:213 [inlined] [22] compute(::String) at /user/leuven/330/vsc33060/.julia/packages/Circuitscape/mKAhh/src/run.jl:31 [23] top-level scope at /data/leuven/330/vsc33060/Julia/scripts/circuittest.jl:4 [24] include at ./boot.jl:328 [inlined] [25] include_relative(::Module, ::String) at ./loading.jl:1094 [26] include(::Module, ::String) at ./Base.jl:31 [27] exec_options(::Base.JLOptions) at ./client.jl:295 [28] _start() at ./client.jl:464 in expression starting at /data/leuven/330/vsc33060/Julia/scripts/circuittest.jl:4

frederikvand commented 4 years ago

[Options for advanced mode] ground_file_is_resistances = True remove_src_or_gnd = keepall ground_file = (Browse for a ground point file) use_unit_currents = False source_file = (Browse for a current source file) use_direct_grounds = False

[Mask file] mask_file = None use_mask = False

[Calculation options] low_memory_mode = False parallelize = True solver = cholmod print_timings = True preemptive_memory_release = True print_rusages = False max_parallel = 35

[Short circuit regions (aka polygons)] polygon_file = (Browse for a short-circuit region file) use_polygons = False

[Options for one-to-all and all-to-one modes] use_variable_source_strengths = False variable_source_file = None

[Output options] set_null_currents_to_nodata = False set_focal_node_currents_to_zero = True set_null_voltages_to_nodata = False compress_grids = False write_cur_maps = True write_volt_maps = False output_file = /user/leuven/330/vsc33060/scratch/leuven/330/vsc33060/circuitscape/circuit/test.out write_cum_cur_map_only = False log_transform_maps = False write_max_cur_maps = False

[Version] version = 5.5.4

[Options for reclassification of habitat data] reclass_file = (Browse for file with reclassification data) use_reclass_table = False

[Logging Options] log_level = INFO log_file = /user/leuven/330/vsc33060/scratch/leuven/330/vsc33060/circuitscape/circuit/log.txt profiler_log_file = /user/leuven/330/vsc33060/scratch/leuven/330/vsc33060/circuitscape/circuit/profiler.txt screenprint_log = False

[Options for pairwise and one-to-all and all-to-one modes] included_pairs_file = (Browse for a file with pairs to include or exclude) use_included_pairs = False
point_file = /user/leuven/330/vsc33060/scratch/leuven/330/vsc33060/circuitscape/resistance_reclassed/points_circuit1911.asc

[Connection scheme for raster habitat data] connect_using_avg_resistances = False connect_four_neighbors_only = False

[Habitat raster or graph] habitat_map_is_resistances = True habitat_file = /user/leuven/330/vsc33060/scratch/leuven/330/vsc33060/circuitscape/resistance_reclassed/resistance_final1911.asc

[Circuitscape mode]
data_type = raster
scenario = pairwise

ViralBShah commented 4 years ago

What is your julia versioninfo()?

frederikvand commented 4 years ago

Thank you for having a look!

Julia Version 1.2.0 Commit c6da87ff4b (2019-08-20 00:03 UTC) Platform Info: OS: Linux (x86_64-pc-linux-gnu) CPU: Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-6.0.1 (ORCJIT, skylake) Environment: EBVERSIONJULIA = 1.2.0 EBROOTJULIA = /apps/leuven/skylake/2018a/software/Julia/1.2.0 EBDEVELJULIA = /apps/leuven/skylake/2018a/software/Julia/1.2.0/easybuild/Julia-1.2.0-easybuild-devel

ViralBShah commented 4 years ago

Seems like we should be using sparse matrices with 64-bit indices here. @ranjanan Is this an easy fix? It would be ok to use 64-bit int sparse matrices by default.

ViralBShah commented 4 years ago

Can you try setting the following in the .ini file?

use_64bit_indexing = true

Like in https://github.com/Circuitscape/Circuitscape.jl/blob/master/test/input/raster/pairwise/16/sgVerify16.ini

frederikvand commented 4 years ago

Dear ViralBShah,

Thank you for the assistance. The script now progresses 15 min longer untill the following bus error occurs. Does this mean that 700gb ram is not sufficient? Resources Used: cput=01:46:16,vmem=14062798144kb,walltime=01:44:30,mem=204836236kb,energy_used=0

signal (7): Bus error in expression starting at /data/leuven/330/vsc33060/Julia/scripts/circuittest.jl:4 memmove_ssse3_back at /usr/lib64/libc.so.6 (unknown line) unsafe_copyto! at ./array.jl:226 [inlined] unsafe_copyto! at ./array.jl:245 copyto! at ./array.jl:275 [inlined] copyto! at ./array.jl:287 [inlined] copyto! at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.2/SharedArrays/src/SharedArrays.jl:587 [inlined] Type at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.2/SharedArrays/src/SharedArrays.jl:358 [inlined] initialize_cum_maps at /user/leuven/330/vsc33060/.julia/packages/Circuitscape/mKAhh/src/utils.jl:216 compute_graph_data_no_polygons at /user/leuven/330/vsc33060/.julia/packages/Circuitscape/mKAhh/src/raster/pairwise.jl:226 _pt_file_no_polygons_path at /user/leuven/330/vsc33060/.julia/packages/Circuitscape/mKAhh/src/raster/pairwise.jl:60 jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2197 raster_pairwise at /user/leuven/330/vsc33060/.julia/packages/Circuitscape/mKAhh/src/raster/pairwise.jl:29 _compute at /user/leuven/330/vsc33060/.julia/packages/Circuitscape/mKAhh/src/run.jl:42 macro expansion at ./util.jl:213 [inlined] compute at /user/leuven/330/vsc33060/.julia/packages/Circuitscape/mKAhh/src/run.jl:31 jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2197 do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:323 eval_value at /buildworker/worker/package_linux64/build/src/interpreter.c:411 eval_stmt_value at /buildworker/worker/package_linux64/build/src/interpreter.c:362 [inlined] eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:772 jl_interpret_toplevel_thunk_callback at /buildworker/worker/package_linux64/build/src/interpreter.c:884 unknown function (ip: 0xfffffffffffffffe) unknown function (ip: 0x2b6e6489728f) unknown function (ip: 0xffffffffffffffff) jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:893 jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:815 jl_parse_eval_all at /buildworker/worker/package_linux64/build/src/ast.c:873 jl_load at /buildworker/worker/package_linux64/build/src/toplevel.c:879 include at ./boot.jl:328 [inlined] include_relative at ./loading.jl:1094 include at ./Base.jl:31 jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2191 exec_options at ./client.jl:295 _start at ./client.jl:464 jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2191 unknown function (ip: 0x40192d) unknown function (ip: 0x401533) libc_start_main at /usr/lib64/libc.so.6 (unknown line) unknown function (ip: 0x4015d4) Allocations: 36417301313 (Pool: 36417266393; Big: 34920); GC: 29401 /var/spool/torque/mom_priv/jobs/50138824.tier2-p-moab-2.tier2.hpc.kuleuven.be.SC: line 16: 3186 Bus error (core dumped) julia "/data/leuven/330/vsc33060/Julia/scripts/circuittest.jl"

ViralBShah commented 4 years ago

Can you run it with just one process (no parallelism), and then increase the amount of parallelism if it works? You are only giving it 20GB per process with 35 processes, which sounds low.

ranjanan commented 4 years ago

Yes, @frederikvand, could you try with fewer processes? Just change max_parallel to something smaller. Each process gets its own copy of the landscape does a certain number of solves.

frederikvand commented 4 years ago

Dear ranjanan and ViralBShah,

Using a different cluster with 12 processes (14 cores) and 750 gb ram solved some problems. However, after 3 hours and 47 min there are is a critical write error. Would you perhaps know the cause? Thanks a lot for all the aid!

/var/spool/torque/mom_priv/prologue: line 44: echo: write error: Invalid argument ┌ Error: Fatal error on process 10 │ exception = │ val already in a list │ Stacktrace: │ [1] error(::String) at ./error.jl:33 │ [2] push! at ./linked_list.jl:53 [inlined] │ [3] wait(::Base.GenericCondition{Base.Threads.SpinLock}) at ./condition.jl:101 │ [4] wait_readnb(::Sockets.TCPSocket, ::Int64) at ./stream.jl:376 │ [5] read at ./stream.jl:853 [inlined] │ [6] message_handler_loop(::Sockets.TCPSocket, ::Sockets.TCPSocket, ::Bool) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.2/Distributed/src/process_messages.jl:190 │ [7] process_tcp_streams(::Sockets.TCPSocket, ::Sockets.TCPSocket, ::Bool) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.2/Distributed/src/process_messages.jl:140 │ [8] (::getfield(Distributed, Symbol("##105#106")){Sockets.TCPSocket,Sockets.TCPSocket,Bool})() at ./task.jl:268 └ @ Distributed /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.2/Distributed/src/process_messages.jl:236 ┌ Warning: Some registries failed to update: │ — ~/.julia/registries/General — registry dirty └ @ Pkg.Types /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.2/Pkg/src/Types.jl:1171 [ Info: 2019-11-21 16:21:03 : Logs will recorded to file: log_file [ Info: 2019-11-21 16:21:03 : Precision used: Double [ Info: 2019-11-21 16:21:03 : Starting up Circuitscape to use 12 processes in parallel [ Info: 2019-11-21 16:21:14 : Reading maps [ Info: 2019-11-21 16:57:34 : Resistance/Conductance map has 491325325 nodes [ Info: 2019-11-21 17:34:27 : Solver used: AMG accelerated by CG [ Info: 2019-11-21 17:34:27 : Graph has 491325325 nodes, 64 focal points and 7182 connected components [ Info: 2019-11-21 17:34:46 : Total number of pair solves = 1776 [ Info: 2019-11-21 19:53:32 : Time taken to construct preconditioner = 318.920926631 seconds [ Info: 2019-11-21 19:55:24 : Time taken to construct local nodemap = 111.855132545 seconds ERROR: LoadError: IOError: write: connection reset by peer (ECONNRESET) Stacktrace: [1] (::getfield(Base, Symbol("##699#701")))(::Task) at ./asyncmap.jl:178 [2] foreach(::getfield(Base, Symbol("##699#701")), ::Array{Any,1}) at ./abstractarray.jl:1920 [3] maptwice(::Function, ::Channel{Any}, ::Array{Any,1}, ::UnitRange{Int64}) at ./asyncmap.jl:178 [4] wrap_n_exec_twice at ./asyncmap.jl:154 [inlined] [5] #async_usemap#684 at ./asyncmap.jl:103 [inlined] [6] #async_usemap at ./none:0 [inlined] [7] #asyncmap#683 at ./asyncmap.jl:81 [inlined] [8] #asyncmap at ./none:0 [inlined] [9] #pmap#213(::Bool, ::Int64, ::Nothing, ::Array{Any,1}, ::Nothing, ::typeof(Distributed.pmap), ::Function, ::Distributed.WorkerPool, ::UnitRange{Int64}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.2/Distributed/src/pmap.jl:126 [10] pmap(::Function, ::Distributed.WorkerPool, ::UnitRange{Int64}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.2/Distributed/src/pmap.jl:101 [11] #pmap#223(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(Distributed.pmap), ::Function, ::UnitRange{Int64}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.2/Distributed/src/pmap.jl:156 [12] pmap at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.2/Distributed/src/pmap.jl:156 [inlined] [13] amg_solver_path(::Circuitscape.GraphData{Float64,Int64}, ::Circuitscape.RasterFlags, ::Dict{String,String}, ::Bool) at /user/leuven/330/vsc33060/.julia/packages/Circuitscape/mKAhh/src/core.jl:221 in expression starting at /data/leuven/330/vsc33060/Julia/scripts/circuittest.jl:5 ┌ Warning: Forcibly interrupting busy workers │ exception = IOError: stream is closed or unusable └ @ Distributed /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.2/Distributed/src/cluster.jl:1221 Worker 10 terminated. Worker 4 terminated.

ViralBShah commented 4 years ago

Can you try with one process?

ViralBShah commented 4 years ago

Your problem appears to be very large. These errors usually happen when you are running out of memory.

ViralBShah commented 4 years ago

We can reopen if we find an underlying circuitscape issue.

ViralBShah commented 4 years ago

Actually, we should close this when we make 64-bit indexing the default. That will address one of the issues here.

frederikvand commented 4 years ago

Thanks ViralBShah, currently the big memory servers are occupied so haven't been able to test it without parallel yet. However, when I monitored memory in the last process (14 processes for 750 gig ram), average memory use was 250 gig ram. I will let you know if I had the same issue without parallel processing.

Have a nice weekend!

ViralBShah commented 4 years ago

Circuitscape 5.5.5 should use 64-bit indexing by default, and that should avoid this from happening.