Next steps in Q10 work - Githubissues

bpbond commented 3 years ago

Johnston, A. S. A. and Sibly, R. M.: The influence of soil communities on the temperature sensitivity of soil respiration, Nat Ecol Evol, 2(10), 1597–1602, 2018. http://dx.doi.org/10.1038/s41559-018-0648-6

Suseela, V., Conant, R. T., Wallenstein, M. D. and Dukes, J. S.: Effects of soil moisture on the temperature sensitivity of heterotrophic respiration vary seasonally in an old-field climate change experiment, Glob. Chang. Biol., 18(1), 336–348, 2012. http://dx.doi.org/10.1111/j.1365-2486.2011.02516.x

Reanalaysis climate data

bpbond commented 3 years ago

Soil moisture is the problem and the thing we really need, because that's the classic reason temperature sensitivity breaks down. Options:

Some COSORE datasets have soil moisture measurements
Since 2015 we have SMAP data, every 3 days, quarter degree, global
There are also may be other global SM products (@jinshijian will check)
We also have reanalysis precipitation data. Not perfect, but something

bpbond commented 3 years ago

Step 1 recommendation:

Download one of the files at https://gimms.gsfc.nasa.gov/SMOS/jbolten/FAS/L03/
Figure out how to read it into R (may want to talk to Max and/or check his repo)
Figure out how to extract a given lat/lon from the file - hopefully we can use raster::extract()

bpbond commented 3 years ago

Step 2 recommendation:

Download entire L03 directory - DON'T do this one at a time! Use a tool
Write a function that takes a date and returns the name(s) of the files to load: for example, passing it 2010-10-11 should return c("20100110_20100112.as1.grib", "20100110_20100112.as2.grib")
A second function should take a filename, lat, lon and return the data from the file
Some test code that verifies this is working correctly
Finally, please clean up branches!

For example, test code can: pass F1 two different dates within 3-day window and verify same output; pass F1 two different dates in different 3-day windows, verify different output; etc.

bpbond commented 3 years ago

We'd like all the machinery to fill in SM data written EXCEPT for the actual grib file read (see #24 ). This will involve:

A function get_sm_data(timestamp, lon, lat)()
Naive solution: construct the filename from the timestamp; read in grib file; extract for lon/lat

bpbond commented 3 years ago

One solution would be to cache the values. In other words, when the function has filename + lon + lat it first checks whether it's already loaded this value before, and if so, doesn't bother doing so again.

bpbond commented 3 years ago

entry <- paste(filename, lon, lat)
if(is.null(cache[[entry]])) {
 # load from grib file
 cache[[entry]] <- value_we_loaded
}
return(cache[[entry]])

bpbond commented 3 years ago

A BETTER solution is to rewrite get_sm_data() to handle a vector of timestamps.

bpbond commented 3 years ago

With a vector, we can do this:

Construct filenames from every vector element
Get the unique() filenames
Load the grib data
Merge/lookup and return full vector

bpbond commented 3 years ago

Start by writing a function that takes a vector of timestamps; converts them to dates; and then returns the unique filenames needed.

bpbond commented 3 years ago

OK @10aDing @jinshijian how about this as our test case:

load_from_grib_file <- function(filename, lon, lat) {
  # Let's say it takes 1/10 of a second to open file and 1/20th s for each data point within file
  n <- length(lon)
  Sys.sleep(0.1 + 0.05 * n)
  return(rep(filename, n))
}

get_sm_data <- function(lon, lat, timestamps) {
  # you write this!
  # should return a vector of "soil moisture data" as gotten from load_from_grib_file
}

library(lubridate)
timestamps <- seq(ymd_hms("2020-10-22 16:42:00"), by = 1000, length.out = 10)
big_timestamps <- seq(ymd_hms("2020-10-22 16:42:00"), by = 1000, length.out = 1e5)

# Report timing:
system.time(get_sm_data(1, 1, timestamps))
system.time(get_sm_data(1, 1, big_timestamps))

bpbond commented 3 years ago

Changed to 100,000 timestamps (not a million).

bpbond commented 3 years ago

OK, I wrote up one solution that uses vectors, not a for loop or a join:

> system.time(get_sm_data(1, 1, timestamps))
2 different files to load
   user  system elapsed 
  0.002   0.002   0.305 
> 
> system.time(get_sm_data(1, 1, big_timestamps))
404 different files to load
   user  system elapsed 
  0.667   0.207  61.517

bpbond commented 3 years ago

Any other solutions @jinshijian @10aDing 😃

CollinWoo commented 3 years ago

Working on it!

bpbond commented 3 years ago

Here's my code FYI:

get_sm_data <- function(lon, lat, timestamps) {
  rounded <- round_date(timestamps, unit = "3 days")
  starts <- gsub("-", "", as.character(rounded))
  stops <- gsub("-", "", as.character(rounded + 60*60*24*2))
  filenames <- paste0(starts, "_", stops, ".as2.grib")

  unique_fns <- unique(filenames)
  message(length(unique_fns), " different files to load")
  unique_data <- sapply(unique_fns, load_from_grib_file, lon, lat)
  return(unique_data[filenames])
}

system.time(get_sm_data(1, 1, timestamps))
system.time(get_sm_data(1, 1, big_timestamps))

CollinWoo / daynight-Q10

Next steps in Q10 work #21