Closed vlandau closed 4 years ago
Yes, with 1M pixels, Circuitscape memory usage shouldn't be an issue. Have you tried memory allocation profiling? https://docs.julialang.org/en/v1/manual/profile/#Memory-allocation-analysis-1
I am going to guess that 32-bit is probably going to not give you meaningful savings.
That's good to know. I'll look into profiling. I haven't tried it before. Thanks!
@ranjan can you speak at all to the savings in memory Circuitscape got with 32 bit (or maybe that mostly done for speed savings)?
Note that it is a bit rough and the actual allocation it reports may be off by a few lines. I still find it quite useful.
Tacking this on here, noting that readdlm
from DelimitedFiles
uses a ton of memory (>5GB) to read in a 1.2GB .asc file and it all remains allocated.
Probably should use something else. Can CSV.jl read these files? Or make gdal the default reader?
readdlm does not receive too much attention. It is not meant for such large files. More of a convenience function.
CSV.jl currently seems to error out while reading our ASC files: https://github.com/JuliaData/CSV.jl/issues/583. As for GDAL, I did an informal benchmark trying to read a 300 MB ASC file (https://github.com/Circuitscape/BigTests/blob/master/96m/cellmap.asc.gz):
julia> @time a = readdlm("../96m/cellmap.asc", skipstart=6);
22.969700 seconds (1.92 M allocations: 4.173 GiB, 4.06% gc time)
and with ArchGDAL:
julia> @time b = ArchGDAL.read(ArchGDAL.read("../96m/cellmap.asc"), 1);
14.599338 seconds (86.79 k allocations: 370.527 MiB, 0.13% gc time)
We certainly need to shift away from readdlm
. Let's wait for CSV.jl benchmark as well. It supports multithreaded reading, so lets see how much gain we get from that as well.
Thanks! That is great advice. I might play around with ArchGDAL. Would I also add GDAL_jll.jl as a dependency so that GDAL gets installed (easily)?
EDIT: I think maybe I don't need GDAL_jll.jl, looks like GDAL.jl (for which ArchGDAL is a wrapper) builds from binaries for you.
julia> @time b = ArchGDAL.read(ArchGDAL.read("../96m/cellmap.asc"), 1);
@ranjanan why do you have to wrap ArchGDAL.read
with ArchGDAL.read
here?
EDIT: never mind, I think I've got it. Looks like ArchGDAL.read
from a filename creates an ArchGDAL dataset, then read
has another method to get the actual values from that dataset.
Alright! Got a memory allocation profile for the functions called in Omniscape.
There appear to be some very obvious culprits.
The first one off the bat that I identified was clip()
, and I've got an idea for how to significantly reduce the memory demands of that function.
Aren't those operations just creating large arrays?
Also clip
could easily avoid array operations using a loop, but the garbage generated should get collected anyways. Curious to see what helps.
What's going on with the null_current_total
line?
null_current_total
has to do with an artifact correction when using "blocks" as moving window centers instead of individual pixels. I basically translated what Brad had already implemented in Omniscape.py into Julia. I'll have to revisit his code to see if there's a better way I can implement it.
I'm wondering if GC is not working efficiently? For example, memory that was allocated during readdlm
does not seem to be garbage collected (AKAICT). Switching to a function other than readdlm
could solve that specific issue, but was just using it as an example.
Also, maybe that null_current_total
is so large because that loop caused the specific line to be evaluated 81*81 times in the run that I test on? New to memory profiling, so not sure exactly how it works :slightly_smiling_face:
Yes, that would be because it is allocating the array each time in a loop. That's probably leading to poor performance too.
In some cases replace!
may work better as a way to replace values with -9999 etc. Sometimes it may be slower, but save memory. Worth trying.
Inside the loops, depending on how small the inner loop is, arguments["radius"]
could be better to manually hoist outside the loop (assign to a variable outside the loop) so you don't have to repeatedly pay the cost inside the loop.
In general, reducing memory usage may also improve performance.
Thanks so much for the suggestions and insights! I'll try some of those things out.
I am sure you have seen this. But many of those tips are related to memory usage.
Note to self: I think this deepcopy is entirely unnecessary.
This massive memory using line is only run once before the main Omniscape program starts up in parallel, so there is some leeway.
Linking #36, which was just closed by #38
Omniscape is now running about 3-4 times faster in serial for a large problem I'm working with since applying some of these fixes, and I'm able to run it on 3 times more parallel processes! :tada:
Close?
Almost! Just checking a few more things.
Actually, I might as well close. I'm thinking allowing a 32-bit option will save GBs of memory since Omniscape will in some cases be working with up to 6 separate arrays (for large problems with climate connectivity enabled), each GB's in size. Will create another issue for that though!
Yes makes sense to do that now.
I'm getting out of memory errors using 1.2 Gb resistance surface on one worker, so I think a closer look needs to be taken at memory consumption. This is on a 32GB RAM machine, so this is no good.
One idea off the bat is to calculate inputs to 32-bit precision, but I think there are likely many more gains to be had by tweaking the code itself.
In theory, the Circuitscape problems themselves should be quite small, <<1M pixels, so most of the consumption is probably happening on the Omniscape side of things.
@ranjanan and @ViralBShah I know you're both busy, but just cc'ing you here in case you have some tips or direction on how to go about addressing this.