eliocamp / metR

Tools for Easier Analysis of Meteorological Fields
https://eliocamp.github.io/metR/
140 stars 22 forks source link

EOF 3d lon lat time array example #132

Closed chrisdane closed 3 years ago

chrisdane commented 3 years ago

Hi Elio

Thanks for you nice package. I tried to redo the EOF example based on the geopotential height data. Since I often work with 3-dim data (lon, lat, time), I tried to apply the function on a 3-dim array directly, and not a data.table object.

Unfortunately, I am not used to the data.table notation. For instance, I am puzzled by the

geopotential[, gh.t.w := Anomaly(gh)*sqrt(cos(lat*pi/180)), by = .(lon, lat, month(date))]

call. gh is grouped by months and then the anomaly is calculated? But how can the date dimension be kept if a grouping is applied? Sorry for this noob question.

Then, how could I proceed with something like

library(data.cube)
gp_arr <- data.cube:::as.array.data.table(geopotential, dimcols=c("lon", "lat", "date"), measure="gh")
str(gp_arr)
# num [1:144, 1:28, 1:72] 2716 2716 2716 2716 2716 ...
# - attr(*, "dimnames")=List of 3
#  ..$ lon : chr [1:144] "0" "2.5" "5" "7.5" ...
#  ..$ lat : chr [1:28] "-90" "-87.5" "-85" "-82.5" ...
#  ..$ date: chr [1:72] "7305" "7336" "7364" "7395" ...

as input for Rmet::EOF()?

Thanks a lot, Chris

eliocamp commented 3 years ago

gh is grouped by months and then the anomaly is calculated? But how can the date dimension be kept if a grouping is applied? Sorry for this noob question.

No apology necessary. That code computes the anomaly for each combination of longitude, latitude and month(date). The month(date) group is not added as a new column (which is different from the way dplyr works, I think) and all previous columns are retained.

The equivalent dplyr code would be something like this:

geopotential %>%
   group_by(month = month(date)) %>%
   mutate(gh.t.w = Anomaly(gh)*sqrt(cos(lat*pi/180)) %>%
   select(-month)

EOF() is to be used with tidy dataframes, not matrices. If your data is in matrix form, then you just need to perform Singular Value Decomposition. Use svd. However, classical SVD is defined for matrices, not arrays (with more than 2 dimensions). Your data should have points in space (combinations of lon lat) as rows and points in time as columns (or the transpose).

btw, thanks for the example. I was aware of the data.cube package but didn't know about the as.array.data.table method. It sounds really useful!