Open gottacatchenall opened 3 years ago
my hacky workaround atm is
function make_same_dims(input::IT, target::TT) where {IT,TT <: SimpleSDMLayer}
lat, long = collect(longitudes(target)), collect(latitudes(target))
newgrid = similar(target)
for lt in lat, lg in long
newgrid[lt,lg] = input[lt,lg]
end
newgrid
end
That's a good point, but ideally something we might want to do with some sort of spatial interpolation. There's a way to get the haversine distance, do we can use this. I wouldn't necessarily call it rescale, maybe reproject? There's a little more thinking to do on this one before moving to a PR.
FWIW, GeoData calls this "resample" and forwards the functionality to ArchGDAL.gdalwarp
That sounds like something we might do -- my current rough draft is
using SimpleSDMLayers
using StatsPlots
using Statistics
ref = SimpleSDMPredictor(WorldClim, BioClim, 3; left=-20.0, right=40.0, top=60.0, bottom=30.0)
target = SimpleSDMResponse(zeros(Float32, 200, 200), -12.056, 37.065, 33.1234, 55.06)
for p in keys(target)
if isnothing(ref[p])
target[p] = nothing
else
vals = SimpleSDMLayers._sliding_values(ref, p, 20.0)
target[p] = isempty(vals) ? nothing : mean(vals)
end
end
@gottacatchenall , given the info @mkborregaard provided, do you want to see if a version based on ArchGDAL.gdalwarp
is possible? It's already a dependency, and we can write/read temp geotiff files...
@tpoisot Yeah GeoData.resample
wraps it with arguments for interpolation rules, it isn't currently a dependency though
i've also written a few functions to take a set of SimpleSDMLayers
and return PCA layers which i could include in the same PR
Two PRs! Mostly because I'd like to do some solid work on the PCA, to make sure we can drop layers into multivariate stats pretty much directly.
Also I wouldn't bring too much stuff from Geo until they figure out whether they use the Geometry types or not...
@tpoisot i'm not sure if GeoData.jl is affiliated with JuliaGeo, here's its deps, which don't seem too heavy
also in progress on opening prs for both PCA and this, from what i can tell interfacing with Mvstats is pretty easy
Re. GeoData that's still a lot of deps. I'm curious to see how far we can go using ArchGDAL only, since we depend on that no matter what.
As far as I can tell resample
only uses ArchGDAL functions.I could port it, but it would effectively be copy/pasting which isn't ideal.
I suppose long-term the julian-esque would be to move all functions that rely on ArchGDAL to a shared lib and have any other heavy dependencies loaded from higher level packages
Thoughts? @tpoisot @mkborregaard @rafaqz
So I don't think it would make sense to take a GeoData dep just to get resample. GeoData is a general raster package, very similar to R's terra (and their old raster package), and as such has a lot of overlapping functionality with SimpleSDMLayers (such as masking, cropping, extracting data, downloading data, aggregating, opening raster files etc etc). GeoData is more focused on being a general raster package, though, whereas in my understanding SimpleSDMLayers is focused on creating a smooth user experience for fitting SDM models.
One could consider having SimpleSDMLayers depend on GeoData, which would give all the raster operations here for free, and I guess make it easier to maintain, but that would change the focus of the package to be explicitly on creating the smooth SDM interface. That depends completely on @tpoisot 's vision for this package.
I think most of the added deps for geodata (e.g. DiskArrays, Flatten, Mmap, ) relate to the functionality for working with on-disk rather than in-memory rasters.
When I originally posted here my suggestion was that you could be inspired (or essentially copy) GeoData's resample function, which should also be doable.
I agree with that assessment @mkborregaard . Depending on GeoData.jl would be a larger change in philosophy and more like a decision to reduce the codebase here by half. GeoData generalizes many of the methods here, including resample
.
Proper broadcasting, base array methods and the Tables.jl interface in GeoData.jl would also save work here. These things are hard to do properly.
So I wouldn't add a GeoData.jl dep just for resample
. You can just copy code from GeoData.jl, and delete all of the generalization code. resample
is just a wrapper to warp
which is n-dimensionally generalised gdalwarp
. Although @gottacatchenall I tend to agree that copy-pasting doesn't seem like a sustainable long-term strategy.
In terms of JuliaGeo, GeoData is no more affiliated than ArchGDAL is - its not a JuliaGeo package. Its my package, partly dog-fed by cesaraustralia but to do generic things better, not specifically to work with cesar packages.
It's also less tightly integrated with GeoInterface.jl than ArchGDAL is. I agree about the GeoInterface/GeometryBasics problem and will likely be involved in the solution in the coming months. But I don't think that's an insurmountable problem currently.
Many of the dependencies like DiskArrays/GeoInterface/ColorTypes/GeoFormatTypes are already ArchGDAL deps. HDF5 and NCDatasets are included because Requires.jl is not a great solution to handle dependency bounds, and we dont have proper glue packages yet. Please chip in there too if you would like the dependency problem resolved, I've used up my quota pushing that...
Yeah for now I think copying warp
will work
Yep. GeoData.jl is not going anywhere if you want to cut some code and upgrade the tooling and functionality here later on. I just hope we can aim for integration rather than copy-paste as a long term goal.
One could consider having SimpleSDMLayers depend on GeoData, which would give all the raster operations here for free, and I guess make it easier to maintain, but that would change the focus of the package to be explicitly on creating the smooth SDM interface. That depends completely on @tpoisot 's vision for this package.
Medium-term, I don't think GeoData will become a dependency of this package. As you'll see in the manuscript we're preparing (on network distribution models), SimpleSDMLayers does things in a different way and is, therefore, both less and more flexible depending on the use-case. Functionalities as they are currently cover our research needs, so don't expect changes in development direction soon.
The overlap in data download is because we couldn't agree on an interface, and that ship has sailed, meaning that there are no plans to remove these functionalities from SimpleSDMLayers in order to use RasterDataSources either.
Great, yeah then it definitely makes more sense to just copy the implementation over :-)
I suppose long-term the julian-esque would be to move all functions that rely on ArchGDAL to a shared lib and have any other heavy dependencies loaded from higher level packages
@gottacatchenall I have thought about this with warp
... it could start by wrapping the arguments to gdalwarp
actually in ArchGDAL.jl instead of in GeoData.jl. ArchGDAL.jl kind of already is that shared lib you're talking about. But it needs a Dict
based front-end to GDAL arguments rather than the current Vector
of String
approach. What I've written in warp
could really be moved there. It's just probably going to be a bigger job doing that consistently for the whole package than it is in GeoData.jl.
Looks like most of the calls to GDAL.jl happen in this file. Seems that if we changed the ArchGDAL wrappers (unsafe_gdalwarp
, etc.) to pass the options
keyword to a function that converts from whatever the options
input type is to the vector of strings required by GDAL.jl
, then this could work
I'm pretty sure this has been discussed as a thing ArchGDAL needs already, so a PR would probably be appreciated.
Here's the code I'm using for a project where we need to do this:
function coerce(template::TT, source::TS, f) where {TS<:SimpleSDMLayer,TT<:SimpleSDMLayer}
coerced = similar(template)
for k in keys(template)
coerced[k] = f(clip(source, k .- stride(template), k .+ stride(template)))
end
return coerced
end
There are a few possible adjustments to make, but I wouldn't be opposed to adding this to a new release.
Description of the to-do item
As far as I can tell, there is not a function to take a raster and change its resolution to match the dimensions of a target raster. Doing this would be straightforward by iterating over the
longitudes(a)
,latitudes(a)
in the smaller raster and then access the values of the other raster at that coord, and have thegetindex(b::SimpleSDMLayer, lat, long)
handle that conversion.When trying to do this I found
rescale(layer1::T, layer2::T) where {T <: SimpleSDMLayer}
,which scales the values of a given raster to match a target, but not dimensions. To avoid this semantic ambiguity, perhaps the methodrescale
could rescale both values and dimensions by default with kw arguments to turn either off