CollinWoo / daynight-Q10

MIT License
2 stars 0 forks source link

Next steps in Q10 work #21

Open bpbond opened 3 years ago

bpbond commented 3 years ago

Johnston, A. S. A. and Sibly, R. M.: The influence of soil communities on the temperature sensitivity of soil respiration, Nat Ecol Evol, 2(10), 1597–1602, 2018. http://dx.doi.org/10.1038/s41559-018-0648-6

Suseela, V., Conant, R. T., Wallenstein, M. D. and Dukes, J. S.: Effects of soil moisture on the temperature sensitivity of heterotrophic respiration vary seasonally in an old-field climate change experiment, Glob. Chang. Biol., 18(1), 336–348, 2012. http://dx.doi.org/10.1111/j.1365-2486.2011.02516.x

Reanalaysis climate data

bpbond commented 3 years ago

Soil moisture is the problem and the thing we really need, because that's the classic reason temperature sensitivity breaks down. Options:

bpbond commented 3 years ago

Step 1 recommendation:

bpbond commented 3 years ago

Step 2 recommendation:

For example, test code can: pass F1 two different dates within 3-day window and verify same output; pass F1 two different dates in different 3-day windows, verify different output; etc.

bpbond commented 3 years ago

We'd like all the machinery to fill in SM data written EXCEPT for the actual grib file read (see #24 ). This will involve:

bpbond commented 3 years ago

One solution would be to cache the values. In other words, when the function has filename + lon + lat it first checks whether it's already loaded this value before, and if so, doesn't bother doing so again.

bpbond commented 3 years ago
entry <- paste(filename, lon, lat)
if(is.null(cache[[entry]])) {
 # load from grib file
 cache[[entry]] <- value_we_loaded
}
return(cache[[entry]])
bpbond commented 3 years ago

A BETTER solution is to rewrite get_sm_data() to handle a vector of timestamps.

bpbond commented 3 years ago

With a vector, we can do this:

  1. Construct filenames from every vector element
  2. Get the unique() filenames
  3. Load the grib data
  4. Merge/lookup and return full vector
bpbond commented 3 years ago

Start by writing a function that takes a vector of timestamps; converts them to dates; and then returns the unique filenames needed.

bpbond commented 3 years ago

OK @10aDing @jinshijian how about this as our test case:

load_from_grib_file <- function(filename, lon, lat) {
  # Let's say it takes 1/10 of a second to open file and 1/20th s for each data point within file
  n <- length(lon)
  Sys.sleep(0.1 + 0.05 * n)
  return(rep(filename, n))
}

get_sm_data <- function(lon, lat, timestamps) {
  # you write this!
  # should return a vector of "soil moisture data" as gotten from load_from_grib_file
}

library(lubridate)
timestamps <- seq(ymd_hms("2020-10-22 16:42:00"), by = 1000, length.out = 10)
big_timestamps <- seq(ymd_hms("2020-10-22 16:42:00"), by = 1000, length.out = 1e5)

# Report timing:
system.time(get_sm_data(1, 1, timestamps))
system.time(get_sm_data(1, 1, big_timestamps))
bpbond commented 3 years ago

Changed to 100,000 timestamps (not a million).

bpbond commented 3 years ago

OK, I wrote up one solution that uses vectors, not a for loop or a join:

> system.time(get_sm_data(1, 1, timestamps))
2 different files to load
   user  system elapsed 
  0.002   0.002   0.305 
> 
> system.time(get_sm_data(1, 1, big_timestamps))
404 different files to load
   user  system elapsed 
  0.667   0.207  61.517 
bpbond commented 3 years ago

Any other solutions @jinshijian @10aDing 😃

CollinWoo commented 3 years ago

Working on it!

bpbond commented 3 years ago

Here's my code FYI:

get_sm_data <- function(lon, lat, timestamps) {
  rounded <- round_date(timestamps, unit = "3 days")
  starts <- gsub("-", "", as.character(rounded))
  stops <- gsub("-", "", as.character(rounded + 60*60*24*2))
  filenames <- paste0(starts, "_", stops, ".as2.grib")

  unique_fns <- unique(filenames)
  message(length(unique_fns), " different files to load")
  unique_data <- sapply(unique_fns, load_from_grib_file, lon, lat)
  return(unique_data[filenames])
}

system.time(get_sm_data(1, 1, timestamps))
system.time(get_sm_data(1, 1, big_timestamps))