MazamaScience / MazamaSatelliteUtils

Satellite Data Download and Utility Functions
http://mazamascience.github.io/MazamaSatelliteUtils
GNU Lesser General Public License v3.0
0 stars 0 forks source link

new goesaodc_createNativeGrid() function #67

Closed jonathancallahan closed 4 years ago

jonathancallahan commented 4 years ago

I played with the new functionality in goesaodc_createTibble() to only load data within a bounding box. Examples are seen in:

Pretty much all the building block capabilities we need are there except for one -- averaging time steps on the native grid.

I believe that a very general way to provide this is to pull most of the guts out of goesaodc_createTibble() and create a new function goesaodc_createNativeGrid(). The basic outline will be:

goesaodc_createNativeGrid(
  nc = NULL,
  box = NULL
) {
  ...
  return(nativeGrid)
}

The new feature of this function is that the nc parameter can be either a single nc handle OR a list of nc handles.

If it is single, then process it just as it was previously done inside of ~_createTibble() returning a nativeGrid list object with elements longitude, latitude, AOD, DQF.

If it is a list, process it as demonstrated in local_jon/create-native-grid-example.R. The final step will be building nativeGrid object that whose [["AOD"]] element is an average of all the nc files that were used. Averaging should be done with na.rm = TRUE.

The goal is to have the option of averaging an hour (or more?) of data on the native grid before passing it upstream for additional processing.

Tasks will include:

lagerratrobe commented 4 years ago

I would like to change one thing from what's specified here - instead of working from a list of .nc file handles, to instead use a list of .nc filenames. This is how the example code is structured in local_jon/create-native-grid-example.R and I think it makes more sense from a user's perspective.

In reviewing all of the current code examples, I see that we often ask the user to do this:

#' netCDF <- "OR_ABI-L2-AODC-M6_G16_s20192491826095_e20192491828468_c20192491835127.nc"
#' nc <- goesaodc_openFile(netCDF)
#' rstr <- goesaodc_createRaster(nc, res = 0.1, dqfLevel = 2)

#----#
#' ncFile <- "OR_ABI-L2-AODC-M6_G16_s20192491826095_e20192491828468_c20192491835127.nc"
#' nc <- goesaodc_openFile(ncFile)
#' tbl <- goesaodc_createTibble(nc)

#----#
#' netCDF <- "OR_ABI-L2-AODC-M6_G16_s20192491826095_e20192491828468_c20192491835127.nc"
#' nc <- goesaodc_openFile(netCDF)
#' sp <- goesaodc_createSpatialPoints(nc, dqfLevel = 2, bbox = kincade_bbox)

I think it would be straight-forward to put the goesaodc_openFile(netCDF) inside the "create" functions and allow the user to pass in .nc filenames into them, rather than forcing them to open the filehandle manually. With the added benefit that we already have a listFiles function that can be used to create lists of files.

Oh yes, and it would give us the additional benefit of being able to explicity close .nc filehandles, which right now are being left open.

lagerratrobe commented 4 years ago

Started this implementation and am able to generate a list of tibbles based on a list of input files. This is not what the issue specified, but I wanted to be able to work backwards and make sure that I wasn't butchering too much. One thing that I see, and which concerns me, is that the output from one file to the next does not match in file count and coordinate values. See below.

# Note that the coordinate values and counts for each tibble don't match.

# names(filtered_tbl)
# [1] "201910271201" "201910271416"

# > head(filtered_tbl)
# $`201910271201`
# A tibble: 1,920 x 4
#     lon   lat     AOD   DQF
#   <dbl> <dbl>   <dbl> <dbl>
# 1 -125.  39.2 0.00972     2
# 2 -125.  39.2 0.0116      2
# 3 -125.  39.2 0.0171      2
# 4 -125.  39.2 0.0126      2
# 5 -125.  39.2 0.0102      2
# 6 -125.  39.2 0.0130      2
# 7 -125.  39.2 0.0124      2
# 8 -125.  39.1 0.0107      2
# 9 -125.  39.1 0.0187      2
#10 -125.  39.1 0.0141      2
# ... with 1,910 more rows

#$`201910271416`
# A tibble: 1,911 x 4
#     lon   lat    AOD   DQF
#   <dbl> <dbl>  <dbl> <dbl>
# 1 -124.  39.0 0.0691     0
# 2 -124.  39.0 0.0548     0
# 3 -124.  39.0 0.0619     0
# 4 -124.  39.0 0.0476     0
# 5 -124.  39.0 0.0375     0
# 6 -124.  39.0 0.0704     0
# 7 -124.  39.0 0.0332     0
# 8 -124.  39.0 0.0503     1
# 9 -124.  39.0 0.0597     1
#10 -124.  39.0 0.0764     1
# ... with 1,901 more rows

# The code I used to generate it looks like this:

library(MazamaSatelliteUtils)
setSatelliteDataDir("~/Data/Satellite")

#kincade_bbox <- c(-124, -120, 36, 39) # original values
kincade_bbox <- c(-124, -123, 38, 39) # small area

my_files <- c("OR_ABI-L2-AODC-M6_G16_s20193001901344_e20193001904117_c20193001907158.nc",
 "OR_ABI-L2-AODC-M6_G17_s20193002116196_e20193002118569_c20193002121014.nc")

filtered_tbl <- goesaodc_createNativeGrid(
  my_files,
  bbox = kincade_bbox)

print(head(filtered_tbl))

# MINIMUM LON/LAT COORDS DON'T MATCH BETWEEN FILES
for (timestamp in names(filtered_tbl)) {
  data <- filtered_tbl[[timestamp]]
  print(timestamp)
  min_lon <- min(data$lon)
  min_lat <- min(data$lat)
  print(sprintf("%f,%f", min_lon, min_lat))
}

What concerns me about this is that I believe the intent in Jonathan's example is that we average across "layers" by matrix position. If that's the case, these offsets are going to be problematic. Also, it's clear that we can't average across these layers and keep the DQF values. So if we want to do DQF filtering, which we currently use in everything from createSpatialPoints and down, we need to push the DQF filter back up into the createNativeGrids() function. Maybe do it at the same place we do the bbox filtering?

# ONE WAY WE COULD DO IT...

# SOME SAMPLE DATA
#     layer_1      #
#  lon   lat    AOD
# -124.0  39.0   18.5
# -124.0  39.1   18.5
# -125.0  39.1   18.5
# -125.0  39.2   18.5
#------------------#
#      layer_2     #
# -124.0  39.0   21.5
# -124.0  39.1   21.5
# -125.0  39.3   21.5
# -125.0  39.1   21.5
####################
library(dplyr)

# COMBINE THEM ALL INTO THE SAME DATA SET
varlist_3[["lon"]] <- c(-124.0, -124.0, -125.0, -125.0,-124.0, -124.0, -125.0, -125.0)
varlist_3[["lat"]] <- c(39.0, 39.1, 39.1, 39.2, 39.0, 39.1, 39.3, 39.1)
varlist_3[["AOD"]] <- c(18.5, 18.5, 18.5, 18.5, 21.5, 21.5, 21.5, 21.5)

tbl_3 <- tibble::as_tibble(varlist_3)

# GROUP BY AND SUMMARIZE ON THE COORDINATES
output <- tbl_3 %>%
  group_by(lon, lat) %>%
  summarise(mean = mean(AOD))

print(output)
lagerratrobe commented 4 years ago

Implemented a small example in local_R/spatial_aggregation.R. I've put some of the relevant code here. Hopefully this captures the nativeGrid() proto code intent.

# CREATE 2 LISTS THAT HAVE THE SAME NUMBER OF POINTS, BUT DON'T SPATIALLY OVERLAP
varlist_1 <- list()
varlist_2 <- list()

# LIST 1
varlist_1[["lon"]] <- c(-124.0, -124.0, -125.0, -125.0)
varlist_1[["lat"]] <- c(39.0, 39.1, 39.1, 39.2)
varlist_1[["AOD"]] <- c(18.5, 18.5, 18.5, 18.5)

# LIST 2
varlist_2[["lon"]] <-  c(-124.0, -124.0, -125.0, -125.0)
varlist_2[["lat"]] <- c(39.0, 39.1, 39.3, 39.1)
varlist_2[["AOD"]] <- c(21.5, 21.5, 21.5, 21.5)

dqf_vals <- c(2, 2, 1, 2)

varlist_1[["DQF"]] <- dqf_vals
varlist_2[["DQF"]] <- dqf_vals

label_1 <- "201910271201"
label_2 <- "201910271416"

nativeGridList <- list()
nativeGridList[[label_1]] <- varlist_1
nativeGridList[[label_2]] <- varlist_2

print(nativeGridList)

# NOW FOR THE AVERAGING
# Create a list containing only 2-D AOD arrays
AODList <- nativeGridList %>% purrr::map(~ .x$AOD)
print(AODList)

# Create a 3-D AOD array
stackedNativeArray <- abind::abind(AODList, rev.along = 0)
print(stackedNativeArray)

# Calculate the mean at every x-y location
AOD_mean <- apply(stackedNativeArray, c(1,2), mean, na.rm = TRUE)
print(AOD_mean)
lagerratrobe commented 4 years ago

Jonathan completed this work yesterday.