I discovered a few gotchas setting up a test local repo
we assume [default.datadir]/data or [default.datadir] in a few places, this is too loose and should be consistent (this is raadtools and raadsync)
wget installer required
there is a difference between "--accept=\"south/nt" and patterns without the asterisks and slash and Windows, try "--accept=\"_nt2016\"" vs. "--accept=\"_south/nt2016\"" on Linux and Windows (the first works on Windows, the second does not)
it's not clear with the NSIDC is southern-only data set, if not it should be clearly named to distinguish the two (I haven't checked properly)
we also need R tools to do all the boring guff about filelist, applying method filters, date filters , maybe as data source-specific handlers ?
raadtools relies on a root-less full file path, currently I use gsub to remove this but got caught out by "\" winslash, we could wrap this to avoid issues (or more smartly separate the file path from the root)
maybe every data set needs its own R object, a function to store all this guff and that builds the config ...
finally, this took me a few hours and it should be much simpler, the download for 2015-2016 sea ice was minutes and that should be the bulk of the time for people to wait, maybe we need a separation from using raadsync as a maintenance tool (for a permanent collection) vs. its use for off-the-cuff once-off downloads for ad hoc usage
I consider this raadsync territory, not to pass the buck but because raadsync needs to own the collection - it might be used by other toolkits, and it needs all this. Consider this issue a placeholder for assigning tasks.
Also the following is a good example to promote with.
raadsync from scratch
At the Australian Antarctic Division we maintain a collection of publically available data sets for general use. To maintain and read these data we use the R for Australian Antarctic Division (RAAD) packages raadsync and raadtools.
This document aims to describe the capabilities of the tools used to build and maintain and use these data sets as well as highlight the exciting new interactive visualizations provided by mapview.
A good example data set is the NSIDC 25km passive microwave sea ice concentration. Here we
use the RAAD tools to build up a short set of the time series in a local collection
read the data and build a visualization with mapview.
The RAAD packages may be installed with devtools from Github (please note these are from different GitHub repositories).
Register a location for the data to be stored locally, this can be anywhere that is writable by the maintainer of the collection.
NOTE Before running this code, please be aware that these tools are designed to download very long time-series of dozens of data sets. They can pull down very many gigabytes of files from only one collection, and for example we have several terabytes because we tend to register all the ones available and have them all completely available and up to date. Not everybody can do this! It's for shared resources at a large research institute.
That said, the sea ice concentration data is relatively small and can be obtained on its own. The download below is ...
Here we put a new file in our local user directory and set it as the default location understood by RAAD. This should be a shared location for general usage, but for this example to be as widely useable as possible a local user installation is reasonable.
Load raadsync, on the first time you need to confirm a setting for caching if this is being done interactively. Just enter "Y".
library(raadsync)
Read the built-in default config file and process it for only the sea ice data of interest. We use "NULL" for the local config as we are not overriding any defaults.
cfg <- read_repo_config(local_config_file = NULL)
Explore this configuration data set.
Nothing is set to synchronize.
any(cfg$do_sync)
What data sets are about "ice"?
grep("ice", cfg$name, ignore.case = TRUE, value = TRUE)
We want the NSIDC SMMR-SSMI/I Nasateam sea ice concentration, though we will be selective and only obtain the southern hemisphere and a recent time series to save time and storage. These data are excellent for exploration as they are relatively low-volume data, delivered in a straightforward binary format on native Polar Stereographic map projection, with complete spatial coverage for a complete daily time series from 1979 to now. Obviously the two poles northern and southern are stored separately, we exclude the north by default here.
Investigate the download options and modify to suit, we only want recent data and the southern hemisphere.
myconfig$method_flags
## only this year and last year (probably on 2015 is available for "final" anyway)
myconfig$method_flags[1] <-
paste(myconfig$method_flags[1], "--accept=\"*nt_2016*\"", "--accept=\"*nt_2015*\"")
## only this year for near-real-time
myconfig$method_flags[2] <-
paste(myconfig$method_flags[2], "--accept=\"*nt_2016*\"")
Synchronize away, please note that this process is time consuming as it thoroughly checks the remote and local sources, including hash signatures for changed files when possible.
sync_repo(myconfig)
Build the file list cache, this is a convenience mechanism for read functions to save scanning the file system. Administrators, please note that the synchronization and file list caching may be set up as routine system jobs to keep everything up to date.
library(raadtools)
icf <- icefiles()
range(icf$date)
## size of collection is pretty small given our set limits above
sum(file.info(icf$fullname)$size)/1e6
Read the data and plot!
ice <- readice(icf$date[seq(1, nrow(icf), by = 14)])
ice <- readice(subset(icf, date >= as.POSIXct("2015-06-09"))$date)
library(mapview)
cubeView(ice)
I discovered a few gotchas setting up a test local repo
I consider this raadsync territory, not to pass the buck but because raadsync needs to own the collection - it might be used by other toolkits, and it needs all this. Consider this issue a placeholder for assigning tasks.
Also the following is a good example to promote with.
raadsync from scratch
At the Australian Antarctic Division we maintain a collection of publically available data sets for general use. To maintain and read these data we use the R for Australian Antarctic Division (RAAD) packages
raadsync
andraadtools
.This document aims to describe the capabilities of the tools used to build and maintain and use these data sets as well as highlight the exciting new interactive visualizations provided by
mapview
.A good example data set is the NSIDC 25km passive microwave sea ice concentration. Here we
mapview
.The RAAD packages may be installed with
devtools
from Github (please note these are from different GitHub repositories).Data repository - administrator task
Register a location for the data to be stored locally, this can be anywhere that is writable by the maintainer of the collection.
NOTE Before running this code, please be aware that these tools are designed to download very long time-series of dozens of data sets. They can pull down very many gigabytes of files from only one collection, and for example we have several terabytes because we tend to register all the ones available and have them all completely available and up to date. Not everybody can do this! It's for shared resources at a large research institute.
That said, the sea ice concentration data is relatively small and can be obtained on its own. The download below is ...
Here we put a new file in our local user directory and set it as the default location understood by RAAD. This should be a shared location for general usage, but for this example to be as widely useable as possible a local user installation is reasonable.
Load
raadsync
, on the first time you need to confirm a setting for caching if this is being done interactively. Just enter "Y".Read the built-in default config file and process it for only the sea ice data of interest. We use "NULL" for the local config as we are not overriding any defaults.
Explore this configuration data set.
Nothing is set to synchronize.
What data sets are about "ice"?
We want the NSIDC SMMR-SSMI/I Nasateam sea ice concentration, though we will be selective and only obtain the southern hemisphere and a recent time series to save time and storage. These data are excellent for exploration as they are relatively low-volume data, delivered in a straightforward binary format on native Polar Stereographic map projection, with complete spatial coverage for a complete daily time series from 1979 to now. Obviously the two poles northern and southern are stored separately, we exclude the north by default here.
Investigate the download options and modify to suit, we only want recent data and the southern hemisphere.
Synchronize away, please note that this process is time consuming as it thoroughly checks the remote and local sources, including hash signatures for changed files when possible.
Build the file list cache, this is a convenience mechanism for read functions to save scanning the file system. Administrators, please note that the synchronization and file list caching may be set up as routine system jobs to keep everything up to date.
Now check what we have in terms of files.
Read the data and plot!