Circuitscape / Circuitscape.jl

Algorithms from circuit theory to predict connectivity in heterogeneous landscapes
https://circuitscape.org
MIT License
128 stars 35 forks source link

Compress grids has no effect #168

Closed kearney-sp closed 3 years ago

kearney-sp commented 5 years ago

When "compress_grids = True", output current maps are not compressed in Circuitscape.jl. I have only tried running pairwise mode. In Circuitscape 4.05, output is compressed into .gz format as expected when running the same .ini file.

ranjanan commented 5 years ago

Yeah, you're right. I forgot to implement that. Apologies. PR for that will be up soon.

kearney-sp commented 5 years ago

wonderful - thanks!

kearney-sp commented 5 years ago

On a similar note: how hard would it be to output grids as integers instead of float? In pairwise mode, the output values of each current map are between 0-1. Couldn't you just multiply them by, say, 10000 and output an integer instead? This would save a lot space if outputting many individual current maps. It could even be a separate flag. One complication is that the cumulative current map may then exceed max value, but you could log transform or compute cumulative on the original 0-1 values.

ranjanan commented 5 years ago

Sure. You could get the same savings by just rounding off to fewer digits than the standard 8 digits I currently do. I did a simple experiment to verify that this is true.

julia> a = rand(10^6); 

julia> using DelimitedFiles

julia> writedlm("test1", a)

julia> writedlm("test2", round.(a,digits=3))

shell> du -sm test1 test2
19  test1
6   test2

If you're ok with this idea, I can implement it.

ranjanan commented 5 years ago

There's also this really experimental single precision mode that you can try. You can use it by specifying precision = single in your INI file. This uses 32-bit floating points in all the calculations as opposed to the standard 64-bit. However I don't know how accurate the answers are. Maybe you can try this too?

kearney-sp commented 5 years ago

I have experimented with the ‘precision = single’ flag in my INI file and get strange results – values near focal nodes are much higher relative to further nodes compared to the ‘precision = double’ option. I think what would be idea (for my application anyway) would be to reduce the precision only after the linear solve, just before writing the current map for that focal node pair. Basically rescaling the results (say via a log transform and converting to integer) and then writing that. Then you could probably get down to 16 bit. My problem isn’t memory, but rather disk space to write all of the individual current maps! I would like them all written in order to do some custom weighting post-processing.

Would this be a simple implementation? Rather than round, do a log transform, multiply by some factor (say 1000) and convert to integer?

From: Ranjan Anantharaman notifications@github.com Sent: January 9, 2019 3:03 PM To: Circuitscape/Circuitscape.jl Circuitscape.jl@noreply.github.com Cc: kearney-sp sean.durango@gmail.com; Author author@noreply.github.com Subject: Re: [Circuitscape/Circuitscape.jl] Compress grids has no effect (#168)

Sure. You could get the same savings by just rounding off to fewer digits than the standard 8 digits I currently do https://github.com/Circuitscape/Circuitscape.jl/blob/master/src/out.jl#L349 . I did a simple experiment to verify that this is true.

julia> a = rand(10^6);

julia> using DelimitedFiles

julia> writedlm("test1", a)

julia> writedlm("test2", round.(a,digits=3))

shell> du -sm test1 test2 19 test1 6 test2

If you're ok with this idea, I can implement it.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Circuitscape/Circuitscape.jl/issues/168#issuecomment-452902527 , or mute the thread https://github.com/notifications/unsubscribe-auth/Ah9iUi__WMfxcNLuKf0DdVDf7XxiujAfks5vBnUigaJpZM4Z4MNN .

ranjanan commented 5 years ago

Sure, but I think the right approach would be to introduce a flag in the INI file which controls the number of digits of precision that the user wants in his output current map. That accomplishes the same thing plus you don't need to rescale (it doesn't make a difference if you're writing Integers or Floating points if you're writing the same number of digits.

Also, would you like to try your hand at PR? :)

kearney-sp commented 5 years ago

OK, that makes sense. I am not used to working with .asc, to be honest! If I’m not mistaken, I think there will still need to be a rescaling step, right? Otherwise, very small values will just get set to ‘0’, rather than something near the lowest possible number with the defined number of digits.

And what is PR?? Maybe?! :)

From: Ranjan Anantharaman notifications@github.com Sent: January 9, 2019 4:07 PM To: Circuitscape/Circuitscape.jl Circuitscape.jl@noreply.github.com Cc: kearney-sp sean.durango@gmail.com; Author author@noreply.github.com Subject: Re: [Circuitscape/Circuitscape.jl] Compress grids has no effect (#168)

Sure, but I think the right approach would be to introduce a flag in the INI file which controls the number of digits of precision that the user wants in his output current map. That accomplishes the same thing plus you don't need to rescale (it doesn't make a difference if you're writing Integers or Floating points if you're writing the same number of digits.

Also, would you like to try your hand at PR? :)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Circuitscape/Circuitscape.jl/issues/168#issuecomment-452920061 , or mute the thread https://github.com/notifications/unsubscribe-auth/Ah9iUmP4wOi9Zw-ysJUmS8DA0kzVINseks5vBoQsgaJpZM4Z4MNN .

ranjanan commented 5 years ago

I believe people logscale their maps when they get really tiny values. They set the log_transform_maps flag to true. And then I end up writing 8 digits of precision of the log transformed map. You could control this 8 via another appropriately named flag, if you'd like to save space. Does this approach make sense?

And what is PR?? Maybe?! :)

PR stands for "pull request". It's just a way of asking you for code contribution. :-) Here's a simple guide on how to create one.

kearney-sp commented 5 years ago

That seems like it would work well then - a flag to control precision. Between that and compression, the user would have a lot of control over output file size.

One more thought: In my case, however, I would like to differentiate between values lower than than the lowest allowable (given the precision) and values that have been masked. Can you think of a way to easily differentiate between these? One option would be to allow the user to define if they want to set values lower than precision to the minimum value or to zero (would require yet another flag). Otherwise, they will get set to zero and then NA once log transformed, same as the masked values.

PR stands for "pull request". It's just a way of asking you for code contribution. :-) Here's a simple guide on how to create one

Gotchya - I will look at the pull request guide! This is my first foray into Julia, but perhaps in the near future I can contribute.

ViralBShah commented 5 years ago

Let's close if implemented.

vlandau commented 4 years ago

Now that .tif writing is implemented on master you can get some pretty good file size savings by setting write_as_tif = true in the .ini. GeoTIFFs will be written with lossless LZW compression by default, which will make file sizes much smaller than an equivalent ASCII. That plus the single precision option now implemented sufficiently addresses file size issues in general IMO (when single precision is used, 32-bit tiffs will be written instead of 64-bit, saving even more space).

vlandau commented 3 years ago

Close due to implementation of compression when writing as tif?