PoisotLab / SimpleSDMLayers.jl

Simple layers for species distribution modeling and bioclimatic data
https://docs.ecojulia.org/SimpleSDMLayers.jl/stable/
MIT License
19 stars 2 forks source link

✅ resizing one layer to the size of another, and ambiguity in `rescale` function meaning #131

Open gottacatchenall opened 3 years ago

gottacatchenall commented 3 years ago

Description of the to-do item

As far as I can tell, there is not a function to take a raster and change its resolution to match the dimensions of a target raster. Doing this would be straightforward by iterating over the longitudes(a), latitudes(a) in the smaller raster and then access the values of the other raster at that coord, and have the getindex(b::SimpleSDMLayer, lat, long) handle that conversion.

When trying to do this I found rescale(layer1::T, layer2::T) where {T <: SimpleSDMLayer},which scales the values of a given raster to match a target, but not dimensions. To avoid this semantic ambiguity, perhaps the method rescale could rescale both values and dimensions by default with kw arguments to turn either off

gottacatchenall commented 3 years ago

my hacky workaround atm is

function make_same_dims(input::IT, target::TT) where {IT,TT <: SimpleSDMLayer}
    lat, long = collect(longitudes(target)), collect(latitudes(target))
    newgrid = similar(target)
    for lt in lat, lg in long
        newgrid[lt,lg] = input[lt,lg]
    end
    newgrid
end
tpoisot commented 3 years ago

That's a good point, but ideally something we might want to do with some sort of spatial interpolation. There's a way to get the haversine distance, do we can use this. I wouldn't necessarily call it rescale, maybe reproject? There's a little more thinking to do on this one before moving to a PR.

mkborregaard commented 3 years ago

FWIW, GeoData calls this "resample" and forwards the functionality to ArchGDAL.gdalwarp

tpoisot commented 3 years ago

That sounds like something we might do -- my current rough draft is

using SimpleSDMLayers
using StatsPlots
using Statistics

ref = SimpleSDMPredictor(WorldClim, BioClim, 3; left=-20.0, right=40.0, top=60.0, bottom=30.0)

target = SimpleSDMResponse(zeros(Float32, 200, 200), -12.056, 37.065, 33.1234, 55.06)

for p in keys(target)
    if isnothing(ref[p])
        target[p] = nothing
    else
        vals = SimpleSDMLayers._sliding_values(ref, p, 20.0)
        target[p] = isempty(vals) ? nothing : mean(vals)
    end
end

diff overlap

@gottacatchenall , given the info @mkborregaard provided, do you want to see if a version based on ArchGDAL.gdalwarp is possible? It's already a dependency, and we can write/read temp geotiff files...

gottacatchenall commented 3 years ago

@tpoisot Yeah GeoData.resample wraps it with arguments for interpolation rules, it isn't currently a dependency though

i've also written a few functions to take a set of SimpleSDMLayers and return PCA layers which i could include in the same PR

tpoisot commented 3 years ago

Two PRs! Mostly because I'd like to do some solid work on the PCA, to make sure we can drop layers into multivariate stats pretty much directly.

tpoisot commented 3 years ago

Also I wouldn't bring too much stuff from Geo until they figure out whether they use the Geometry types or not...

gottacatchenall commented 3 years ago

@tpoisot i'm not sure if GeoData.jl is affiliated with JuliaGeo, here's its deps, which don't seem too heavy

also in progress on opening prs for both PCA and this, from what i can tell interfacing with Mvstats is pretty easy

tpoisot commented 3 years ago

Re. GeoData that's still a lot of deps. I'm curious to see how far we can go using ArchGDAL only, since we depend on that no matter what.

gottacatchenall commented 3 years ago

As far as I can tell resample only uses ArchGDAL functions.I could port it, but it would effectively be copy/pasting which isn't ideal.

I suppose long-term the julian-esque would be to move all functions that rely on ArchGDAL to a shared lib and have any other heavy dependencies loaded from higher level packages

Thoughts? @tpoisot @mkborregaard @rafaqz

mkborregaard commented 3 years ago

So I don't think it would make sense to take a GeoData dep just to get resample. GeoData is a general raster package, very similar to R's terra (and their old raster package), and as such has a lot of overlapping functionality with SimpleSDMLayers (such as masking, cropping, extracting data, downloading data, aggregating, opening raster files etc etc). GeoData is more focused on being a general raster package, though, whereas in my understanding SimpleSDMLayers is focused on creating a smooth user experience for fitting SDM models.

One could consider having SimpleSDMLayers depend on GeoData, which would give all the raster operations here for free, and I guess make it easier to maintain, but that would change the focus of the package to be explicitly on creating the smooth SDM interface. That depends completely on @tpoisot 's vision for this package.

I think most of the added deps for geodata (e.g. DiskArrays, Flatten, Mmap, ) relate to the functionality for working with on-disk rather than in-memory rasters.

When I originally posted here my suggestion was that you could be inspired (or essentially copy) GeoData's resample function, which should also be doable.

rafaqz commented 3 years ago

I agree with that assessment @mkborregaard . Depending on GeoData.jl would be a larger change in philosophy and more like a decision to reduce the codebase here by half. GeoData generalizes many of the methods here, including resample.

Proper broadcasting, base array methods and the Tables.jl interface in GeoData.jl would also save work here. These things are hard to do properly.

So I wouldn't add a GeoData.jl dep just for resample. You can just copy code from GeoData.jl, and delete all of the generalization code. resample is just a wrapper to warp which is n-dimensionally generalised gdalwarp. Although @gottacatchenall I tend to agree that copy-pasting doesn't seem like a sustainable long-term strategy.

In terms of JuliaGeo, GeoData is no more affiliated than ArchGDAL is - its not a JuliaGeo package. Its my package, partly dog-fed by cesaraustralia but to do generic things better, not specifically to work with cesar packages.

It's also less tightly integrated with GeoInterface.jl than ArchGDAL is. I agree about the GeoInterface/GeometryBasics problem and will likely be involved in the solution in the coming months. But I don't think that's an insurmountable problem currently.

Many of the dependencies like DiskArrays/GeoInterface/ColorTypes/GeoFormatTypes are already ArchGDAL deps. HDF5 and NCDatasets are included because Requires.jl is not a great solution to handle dependency bounds, and we dont have proper glue packages yet. Please chip in there too if you would like the dependency problem resolved, I've used up my quota pushing that...

gottacatchenall commented 3 years ago

Yeah for now I think copying warp will work

rafaqz commented 3 years ago

Yep. GeoData.jl is not going anywhere if you want to cut some code and upgrade the tooling and functionality here later on. I just hope we can aim for integration rather than copy-paste as a long term goal.

tpoisot commented 3 years ago

One could consider having SimpleSDMLayers depend on GeoData, which would give all the raster operations here for free, and I guess make it easier to maintain, but that would change the focus of the package to be explicitly on creating the smooth SDM interface. That depends completely on @tpoisot 's vision for this package.

Medium-term, I don't think GeoData will become a dependency of this package. As you'll see in the manuscript we're preparing (on network distribution models), SimpleSDMLayers does things in a different way and is, therefore, both less and more flexible depending on the use-case. Functionalities as they are currently cover our research needs, so don't expect changes in development direction soon.

The overlap in data download is because we couldn't agree on an interface, and that ship has sailed, meaning that there are no plans to remove these functionalities from SimpleSDMLayers in order to use RasterDataSources either.

mkborregaard commented 3 years ago

Great, yeah then it definitely makes more sense to just copy the implementation over :-)

rafaqz commented 3 years ago

I suppose long-term the julian-esque would be to move all functions that rely on ArchGDAL to a shared lib and have any other heavy dependencies loaded from higher level packages

@gottacatchenall I have thought about this with warp... it could start by wrapping the arguments to gdalwarp actually in ArchGDAL.jl instead of in GeoData.jl. ArchGDAL.jl kind of already is that shared lib you're talking about. But it needs a Dict based front-end to GDAL arguments rather than the current Vector of String approach. What I've written in warp could really be moved there. It's just probably going to be a bigger job doing that consistently for the whole package than it is in GeoData.jl.

gottacatchenall commented 3 years ago

Looks like most of the calls to GDAL.jl happen in this file. Seems that if we changed the ArchGDAL wrappers (unsafe_gdalwarp, etc.) to pass the options keyword to a function that converts from whatever the options input type is to the vector of strings required by GDAL.jl, then this could work

rafaqz commented 3 years ago

I'm pretty sure this has been discussed as a thing ArchGDAL needs already, so a PR would probably be appreciated.

tpoisot commented 2 years ago

Here's the code I'm using for a project where we need to do this:

function coerce(template::TT, source::TS, f) where {TS<:SimpleSDMLayer,TT<:SimpleSDMLayer}
    coerced = similar(template)
    for k in keys(template)
        coerced[k] = f(clip(source, k .- stride(template), k .+ stride(template)))
    end
    return coerced
end

There are a few possible adjustments to make, but I wouldn't be opposed to adding this to a new release.