JGCRISummer2024 / Global-Scale-Birch-Effect

Detecting a Birch Effect on a global scale using global databases like SRDB, SPEI, WorldClim, and NPP
MIT License
0 stars 0 forks source link

Next steps - calculating 'birch' #3

Closed bpbond closed 2 weeks ago

bpbond commented 1 month ago
  1. WorldClim averages (MAT and MAP) and pulling in NPP data from NASA MODIS (see email)

  2. SPEI - this is expensive (slow) so we have several steps...

bpbond commented 1 month ago

Pull out data for our coordinates, caching for future use

# This is expensive, so we cache the extracted SPEI data
spei_dat_file <- paste("spei", nrow(coords), digest::digest(coords), sep = "_")
if(file.exists(spei_dat_file)) {
    message("Loading saved data ", spei_dat_file)
    spei_dat <- readRDS(spei_dat_file)
} else {
    message("Extracting SPEI data; this is slow...")
    # Documentation: https://spei.csic.es/database.html
    # Data downloaded 2024-07-17
    library(terra)
    spei <- rast("spei_data/spei01.nc")

    # Extract our points of interest. terra::extract() will handle making sure
    # the coordinates get mapped to the correct grid cell(s) in the data
    spei_dat <- terra::extract(spei, coords)
    saveRDS(spei_dat, file = spei_dat_file)
}

Now we need to reshape and add a time column:

# Reshape data into a more manageable form
library(tidyr)
spei_monthly <- pivot_longer(spei_dat, -ID)
spei_monthly <- separate(spei_monthly, name, into = c("spei", "entry"), convert = TRUE)
# The SPEI data don't seem to provide 'time' explicitly in the netcdf, so
# compute it from the entries
spei_monthly$year <- ceiling(spei_monthly$entry / 12) + 1900
spei_monthly$month <- (spei_monthly$entry - 1) %% 12 + 1
spei_monthly$time <- with(spei_monthly, year + (month-1) / 12)
bpbond commented 1 month ago

Compute either growing season drought or annual drought index

I chose gsd (growing season) but don't know that's right

# Compute gsd - growing season drought
spei_monthly %>%
    arrange(ID, year, month) %>%
    group_by(ID, year) %>%
    summarise(gsd = mean(value[6:8]), .groups = "drop") ->
    gsd

# Merge back with coords
coords %>%
    mutate(ID = seq_len(nrow(coords))) %>%
    left_join(gsd, by = "ID") ->
    gsd

At the end of this, gsd has coordinates, year, and growing season SPEI average

bpbond commented 1 month ago

For each SRDB row we're using, look up SPEI(yr) and SPEI(yr-1)

# For each lon/lat pair in the SRDB data, look up gsd0 (current year SPEI),
# gsd1 (last year), and gsd2 (etc)
get_spei <- function(lon, lat, yr) {
    gsd %>%
        filter(Longitude == lon, Latitude == lat) %>%
        arrange(year) -> x

    if(nrow(x) == 0) return(c(NA, NA)) # not found

    y <- which(x$year == yr)
    x$gsd[c(y, y-1]
}

srdb %>%
    select(Study_midyear, Longitude, Latitude) %>%
    mutate(year = floor(Study_midyear)) %>%
    distinct() ->
    srdb_spei

message("Looking up growing season SPEI (yr 0, -1, -2) for SRDB data...")
srdb_spei$gsd2 <- srdb_spei$gsd1 <- srdb_spei$gsd0 <- NA_real_
for(i in seq_len(nrow(srdb_spei))) {
    spei_i <- get_spei(srdb_spei$Longitude[i], srdb_spei$Latitude[i], srdb_spei$year[i])
    srdb_spei$gsd0[i] <- spei_i[1]
    srdb_spei$gsd1[i] <- spei_i[2]
}
bpbond commented 1 month ago

Calculate 'birch' flag

        # I am not sure about having an arbitrary 'drought cutoff'
        # What about a birch variable that is
        #   (gsd0 - gsd1) if gsd0 >=0, and 0 otherwise
        # In other words, the gsd difference from year 0 to year -1, but only
        # when gsd0 (the current year) is at least 0, i.e. normal
        birch_lgl = gsd0 > 0 & gsd1 < drought_cutoff,
H-Stein commented 1 month ago

image

H-Stein commented 1 month ago

image Unless I'm misinterpreting this, if we assume that there is no relation between the annual CO2 flux and a drought of at least -1 on SPEI scale ending, these events have nearly no chance of occurring (p < .0.000001). @bpbond this p value is VERY low which is interesting, but can I trust it? I could just be overthinking it but are we allowed to do this kind of test when one variable is an integer and one is a boolean? How would that work? Or am I misunderstanding what kind of test it's doing.

EDIT: spei_flag is a boolean value that's true when the average SPEI of the growing season (which I defined as the months that average above 10 degrees Celsius, a definition inspired by the USDA's definition of the growing season, which they claim "can be approximated as the period of time between the average date of the last killing frost in the spring to the average date of the first killing frost in the fall. This represents a temperature threshold of 28 degrees F or lower at a frequency of 5 years in 10.") of the year of the study is above 0, and the average SPEI of the growing season the previous year was at most -1.

bpbond commented 1 month ago

@H-Stein P-values are the probability of these data occurring if — IF — there's actually no relationship between the variables being tested. So in the table you've pasted above, the p-value for spei_flag is very low, <0.001, meaning that's it's very unlikely for these data to exist if there's no Birch effect happening.

In other words, the spei_flag — the "birch effect flag" — is highly significant at predicting Annual_CO2_Flux. 😮

bpbond commented 1 month ago

THAT IS COOL!

H-Stein commented 1 month ago
H-Stein commented 1 month ago

Tested different spei_flag values, here is the table. image SPEI_Boundary is the cutoff for spei_flag. For example, if SPEI_Boundary is -0.5, spei_flag will be true if the last year had an average growing season SPEI of at most -0.5 and the year of the measurement had an average growing season SPEI of at least 0. Significance is the p value returned for spei_flag by running a type III ANOVA test on the data, using MAT, MAP, NPP, spei_0, spei_1, and spei_flag as predictors for Rs_annual

It appears that spei_flag is significant at predicting Rs_annual at SPEI_Boundary values up to -0.3, but is not very significant at predicting Rs_annual at boundary value of -0.6 for some reason.

bpbond commented 1 month ago

Oh, awesome! 👏 Nice work @H-Stein

The 0.6 thing is a blip — a random false-negative. Focus on the overall pattern, which is that your results are robust across various SPEI cutoff assumptions. Cool!

H-Stein commented 1 month ago

Played around with precipitation filtering a little bit and got interesting results. This output is when I filter for only places that receive less than 750mm of rain a year (N = 79). image The p value is very high, indicating that the Birch Effect is not at all significant at predicting Rs_annual in dry locations. As a note, these data are limited enough that I had to reduce the Birch Effect cutoff to -0.5 because otherwise there weren't any locations with a Birch Effect and the ANOVA test returned an error (see screenshot below). image After a combination of googling and experimenting, this error seems to be caused by there being zero instances of the Birch Effect in the data set. It's not a problem with 6,500 sites but it becomes an issue if you take away enough data.

image Here is the data for places with over 5000mm (197") of precipitation a year (N = 2506). The Birch Effect is very significant at predicting Rs_annual, even for very wet places. NOTE: The Birch Effect here was defined like normal, with the cutoff being -1 since there was no need to get more data. Here is a histogram of precipitation amounts in the whole Birch Effect data set image

bpbond commented 1 month ago

Interesting. OK, thanks @H-Stein , good info to have. I agree that with very small sample sizes we need to be careful about drawing any conclusions.

Histogram: those precipitation numbers are very high. Hmm.

H-Stein commented 1 month ago

Interesting. OK, thanks @H-Stein , good info to have. I agree that with very small sample sizes we need to be careful about drawing any conclusions.

Histogram: those precipitation numbers are very high. Hmm.

I was also a little confused by the high precipitation values. It's not like most of the data points are in the Amazon or anything. I'll double check the code today with some test locations as a sanity check and get back to you.

H-Stein commented 1 month ago

@bpbond found the issue image turns out that when I was getting the worldclim data and finding the mean temp and mean annual precipitation, I was including the ID column (a number ranging from 1 to 6500). Because of that, it was like there was some sort of insane month that averaged 3000 degrees, or another where it rained 6000mm. That should be fixed now though 😅

image Here are the new analysis results, not quite as low of a p value, but still very significant.

image And here's the new precipitation histgram... much better!

bpbond commented 1 month ago

Haha that would throw things off. Nice work finding and fixing it!