EcoJulia / EcoJulia.org

https://ecojulia.org
8 stars 1 forks source link

Package name #4

Closed tpoisot closed 3 years ago

tpoisot commented 5 years ago

I'm re-packaging my code to get the bioclim variables at coordinates as a package -- @mkborregaard (and also @richardreeve and @kescobo), do you prefer BioClim or WorldClim as a package name? I feel like BioClim is close to the SDM model of the same name, but this should be a SDM.jl package if we eventually go there.

mkborregaard commented 5 years ago

WorldClim, since that's the name of the source of the data. Do they have some kind of web API you tap into? I tend to prefer the Chelsa climatologies to the kriging-based WorldClim ones - you can download a raster, and then I have a simple raster package (https://github.com/mkborregaard/VerySimpleRasters.jl ) to extract values at coordinates.

tpoisot commented 5 years ago

They don't have an API, but it's easy to get the address to download it. I agree that Chelsa would probably be better (after having read a bit on how they reconstructed it). I also like the rasters package! I had something in progress, but it was not as full-featured.

richardreeve commented 5 years ago

I'm not sure we need to be hugely opinionated here... the official releases of Chelsa are available from DataDryad as far I'm aware, so could be accessed via DataDeps.jl, generating the script to download it using DataDepsGenerators.jl - as far as I understand those packages - and I'm sure a similar thing could be done for WorldClim. Then we could have a really useful package that could automate the process of accessing this kind of data (maybe using VerySimpleRasters.jl) called something like ClimateData.jl perhaps? I haven't used DataDeps.jl myself, but I saw the talk at JuliaCon last year and it looked very cool...

PS I'm not saying all of this needs to be done now for @tpoisot's immediate release, but if we thought this kind of thing was useful, it could have a more generic name, and the functionality could be slowly extended...

mkborregaard commented 5 years ago

Sure - but I'm just trying to understand why that's smarter than just having a package that downloads (and possibly loads) the rasters? That could be very simple.

tpoisot commented 5 years ago

I like the idea of a function to just download the tiles, in a package called BioClimaticData.jl -- we could have methods like data[x], and data[x, n], which would give an array of values, or the nth variable, and x can be all sort of things (a GBIFRecord, an EcoBase object, an AbstractPosition, ...)

@mkborregaard do you think VerySimpleRasters.jl is already in a state where we can do this? This would be an important stepping stone towards very fun stuff.

mkborregaard commented 5 years ago

It needs an inbuilt driver for GeoTIFF, and I don't want to depend on GDAL. That shouldn't be tough to write though. Do you know where the binary specification of the format is defined? Otherwise GDAL is MIT so I could port theirs.

How would your interface work with x - as a trait? I feel like we might think more about inheritance, eg. linking ecobase objects and gbifrecords, possibly to AbstractPosition. I feel like depending on the GeoInterface and having methods for AbstractPosition and Vector{<:AbstractPosition} and then extend GeoInterface.coordinates so you would do data[coordinates(x)] would be a cleaner design?

mkborregaard commented 5 years ago

Ah - so GDAL simply wraps the binary geotif format, which is a binary dependency in it's own right. That would of course be nice to avoid but the format is not simple to implement: https://www.geospatialworld.net/article/geotiff-a-standard-image-file-format-for-gis-applications/ In fact the geotiff C library is fairly big. So the answer is, no, VerySimpleRasters is currently not up to this task (it's big limitation is it only supports like 2 raster formats), and for now you might be better off with using GDAL directly, like you did in the bioclim example. BTW at the IBS there was a japanese research group telling me that they had initiated work on a comprehensive sdm package for julia.

richardreeve commented 5 years ago

It could be very simple, but as far as I recall from the talk, DataDeps.jl handles two things you haven't mentioned:

  1. caching the data seamlessly
  2. referencing the material so it doesn't look like it's your data
mkborregaard commented 5 years ago

chelsa doesn't have tiles :-/

mkborregaard commented 5 years ago

We could ask @dirkkarger what the preferred way is to provide programmatic access to Chelsa data?

mkborregaard commented 5 years ago

GeoTIFF uses a data compression we cannot memory map from Julia. So @tpoisot the best approach is probably to use your geotiff parser and accept the GDAL dep?

pdimens commented 3 years ago

Since this repo is for the EcoJulia site, this thread can be closed.