DataScienceHobart / 2017-02-03-graphics-imas

4 stars 2 forks source link

Environmental data from extent using raadtools #7

Open J-Cleeland opened 7 years ago

J-Cleeland commented 7 years ago

Hey @mdsumner, Just had a query from Dahlia about extracting remotely sensed data from an extent object using raadtools. Do you already have a tutorial for this? Or shall we make a new markdown for it? Cheers, Jaimie

mdsumner commented 7 years ago

Any of the read functions have an "xylim" argument, which will take a raster::extent(xmin, xmax, ymin, ymax). E.g.

readsst("2010-01-01", xylim = extent(100, 160, -50, -40))

That extent must make sense for the data set though, so see what this returns first (it will be a longlat-grid in [-180, 180, -90, 90].

readsst()

That works for readssh, readcurr, and readwind just the same, and the date argument can be a single date or a vector of them.

Some of the functions respect lon180 = FALSE because they have a grid that was originally in [0, 360, -90, 90] and in that case it can be faster to use that domain for the extent rather than the common [-180, 180, -90, 90] one. (Please just ask though I can see that is probably hard to understand without seeing it in action).

For ice, and some other data, the grids are not in longlat so xylim needs an extent in that projection. You can always draw one like this:

ice <- readice()  
plot(ice)
e <- drawExtent()

otherwise there are functions projectExtent and some others to do it more automatically. It's not a simple task though because of the polar aspect.

ice <- readice() 
rex <- raster( extent(100, 160, -50, -40), crs = "+proj=longlat +datum=WGS84")
pex <- projectExtent(rex, projection(ice))

## pex is a raster, not just an extent but that's good because
## it also has the CRS (it's nonsense to have an extent without CRS, but that's where we are)
readice("2017-01-01", xylim = pex)

I'm happy to answer stuff directly here, it's the best way because I'll otherwise probably never get to the proper documentation ... :)

Finally, the xylim argument just applies crop, so you can avoid it altogether and do it after the fact.

crop(readsst(),  extent(100, 160, -50, -40), snap = "out")

If there's a really long time series though it will be better to do that with xylim, and even to pass it out to a file:

sst <- readsst(dates,  xylim = extent(100, 160, -50, -40), filename = "mymegasplat.grd")
J-Cleeland commented 7 years ago

Cool. Thanks. That's a great start.

mdsumner commented 7 years ago

I wrote it up a little better.

http://rpubs.com/cyclemumner/raad-extract01

( I still can't get it to build and host with the package, still working on that. )

dahliasan commented 7 years ago

Thanks everyone, and the rpub article was very helpful. I've got a question:

Going from the

dates <- seq(as.Date("2016-01-01"), length = 31, by = "1 day")
sst <- readsst(dates,  xylim = extent(100, 160, -50, -40))
msst <- calc(sst, mean)
sdsst <- calc(sst, sd)
plot(msst, col = viridis::viridis(50), main = "mean SST Jan 2016",  addfun = function() contour(sdsst, add = TRUE))

If I wanted to create a monthly (or weekly) averaged time series for multiple years. Would the best way be to begin by perhaps creating a list of monthly dates chunks (as above) and then feeding it into readsst into a loop and then combining them at the end into a single object.

Or,

get the entire timeseries sst into a single object then manipulating after - but this i'm not too sure where or how to start.

Hope that makes sense! Thanks :)

mdsumner commented 7 years ago

Best to make a vector of all the dates you want, read them all in and do one mean calc. You can process the files themselves to get just particular dates:

sstf <- sstfiles()
library(dplyr)
## filter the files to only be "recent Januarys"
janfiles <- sstf %>% filter(date >= as.POSIXct("2013-01-01"), format(date, "%m") == "01")
dim(janfiles)
#[1] 155   3

jansst <- readsst(janfiles$date, xylim = extent(100, 160, -50, -40))
nlayers(jansst)
# [1] 155

etc. If you need really big sets of time series it might matter to go about this slightly differently, but reading in 155 slices like that should be no problem.

mdsumner commented 7 years ago

In case anyone is reading this, it turns out that is being extremely wasteful with temp files, and it's pretty slow. I've got code to fix specific cases but this has triggered a rework of how raadtools is doing this generally. I'm tentatively optimistic that it's fixable in a short time, but feel free to ping me for workarounds in the meantime.

dahliasan commented 7 years ago

Hi Mike, the temporary work around code works much faster!

Now I'm trying to figure out doing the same thing for chlorophyll, do you know which set of files I should use? Currently what I have now is this:

chlaf <- ocfiles(time.resolution = "daily", type= "L3m", varname = c("CHL", "SNPP_CHL"))

## filter the files for each month since 2000
my_date <- "2000-01-01"
janfiles <- chlaf %>% filter(date >= as.POSIXct(my_date), format(date, "%m") == "01")

my_extent =  extent(134, 144, -45, -35.5)
chla <- crop(stack(janfiles$fullname, varname = "chlor_a"), my_extent, filename = paste("jan_",fn, sep = ""))

plot(chla)

mchla <- calc(chla, mean, na.rm = TRUE)
sdchla <- calc(chla, sd, na.rm = TRUE)
plot(mchla, col = viridis::viridis(50), main = "mean chla Jan 2000-present",  addfun = function() contour(sdchla, add = TRUE))

I tried playing with using chlafiles() too trying to get one with the least amount of gaps in the data but it all seems about the same, maybe that's just the way it is?