Open bpbond opened 3 years ago
Soil moisture is the problem and the thing we really need, because that's the classic reason temperature sensitivity breaks down. Options:
Step 1 recommendation:
raster::extract()
Step 2 recommendation:
date
and returns the name(s) of the files to load: for example, passing it 2010-10-11
should return c("20100110_20100112.as1.grib", "20100110_20100112.as2.grib")
For example, test code can: pass F1 two different dates within 3-day window and verify same output; pass F1 two different dates in different 3-day windows, verify different output; etc.
We'd like all the machinery to fill in SM data written EXCEPT for the actual grib file read (see #24 ). This will involve:
get_sm_data(timestamp, lon, lat)()
One solution would be to cache the values. In other words, when the function has filename + lon + lat it first checks whether it's already loaded this value before, and if so, doesn't bother doing so again.
entry <- paste(filename, lon, lat)
if(is.null(cache[[entry]])) {
# load from grib file
cache[[entry]] <- value_we_loaded
}
return(cache[[entry]])
A BETTER solution is to rewrite get_sm_data()
to handle a vector of timestamps.
With a vector, we can do this:
unique()
filenamesStart by writing a function that takes a vector of timestamps; converts them to dates; and then returns the unique filenames needed.
OK @10aDing @jinshijian how about this as our test case:
load_from_grib_file <- function(filename, lon, lat) {
# Let's say it takes 1/10 of a second to open file and 1/20th s for each data point within file
n <- length(lon)
Sys.sleep(0.1 + 0.05 * n)
return(rep(filename, n))
}
get_sm_data <- function(lon, lat, timestamps) {
# you write this!
# should return a vector of "soil moisture data" as gotten from load_from_grib_file
}
library(lubridate)
timestamps <- seq(ymd_hms("2020-10-22 16:42:00"), by = 1000, length.out = 10)
big_timestamps <- seq(ymd_hms("2020-10-22 16:42:00"), by = 1000, length.out = 1e5)
# Report timing:
system.time(get_sm_data(1, 1, timestamps))
system.time(get_sm_data(1, 1, big_timestamps))
Changed to 100,000 timestamps (not a million).
OK, I wrote up one solution that uses vectors, not a for
loop or a join
:
> system.time(get_sm_data(1, 1, timestamps))
2 different files to load
user system elapsed
0.002 0.002 0.305
>
> system.time(get_sm_data(1, 1, big_timestamps))
404 different files to load
user system elapsed
0.667 0.207 61.517
Any other solutions @jinshijian @10aDing 😃
Working on it!
Here's my code FYI:
get_sm_data <- function(lon, lat, timestamps) {
rounded <- round_date(timestamps, unit = "3 days")
starts <- gsub("-", "", as.character(rounded))
stops <- gsub("-", "", as.character(rounded + 60*60*24*2))
filenames <- paste0(starts, "_", stops, ".as2.grib")
unique_fns <- unique(filenames)
message(length(unique_fns), " different files to load")
unique_data <- sapply(unique_fns, load_from_grib_file, lon, lat)
return(unique_data[filenames])
}
system.time(get_sm_data(1, 1, timestamps))
system.time(get_sm_data(1, 1, big_timestamps))
Johnston, A. S. A. and Sibly, R. M.: The influence of soil communities on the temperature sensitivity of soil respiration, Nat Ecol Evol, 2(10), 1597–1602, 2018. http://dx.doi.org/10.1038/s41559-018-0648-6
Suseela, V., Conant, R. T., Wallenstein, M. D. and Dukes, J. S.: Effects of soil moisture on the temperature sensitivity of heterotrophic respiration vary seasonally in an old-field climate change experiment, Glob. Chang. Biol., 18(1), 336–348, 2012. http://dx.doi.org/10.1111/j.1365-2486.2011.02516.x
Reanalaysis climate data