bgctw / REddyProc

Processing data from micrometeorological Eddy-Covariance systems
58 stars 32 forks source link

"Time stamp is not equidistant" when data are indeed half-hourly #54

Closed arthur-e closed 2 years ago

arthur-e commented 2 years ago

I'm trying to process Ameriflux half-hourly flux data, for example for site US-CS1 and the file:

AMF_US-CS1_BASE_HH_2-5.csv

Which can be downloaded from Ameriflux. These data are indeed half-hourly and, though it's inconvenient, I used expand.grid() and a left_join() to guarantee that the data are on half-hourly steps (i.e., data are joined to a half-hourly time series by their date and hour of day).

require(REddyProc)

filename <- '~/Downloads/Ameriflux/extracts/AMF_US-CS1_BASE_HH_2-5.csv'

# Extract time of day, normalize field names
df0 <- read.csv(filename, skip = 2) %>%
  select(date = TIMESTAMP_START, NEE = NEE_PI_F, Rg = SW_IN_1_1_1, 
    Tair = TA_1_1_1, VPD = VPD_PI_1_1_1, Ustar = USTAR_1_1_1) %>%
  mutate(time = substr(date, 9, 12)) %>%
  mutate(hour = as.integer(substr(time, 1, 2)) + as.integer(substr(time, 3, 4)) / 60) %>%
  mutate(date = as.Date(substr(as.character(date), 1, 9), format = '%Y%m%d'))

# Create a continuous, half-hourly time series
df <- expand.grid(date = seq.Date(from = min(df0$date), to = max(df0$date), by = 'day'), 
    hour = seq(0, 23.5, 0.5)) %>%
  # Join the original data to this series
  left_join(df0, by = c('date', 'hour')) %>%
  arrange(date, hour) %>%
  # Extract meaningful date/time components for use in fConvertTimeToPosix()
  mutate(year = format(date, '%Y'), month = format(date, '%m'), day = format(date, '%d'),
    minute = ifelse(hour %% 1 == 0, 30, 0)) %>%
  mutate(hour = floor(hour)) %>%
  arrange(date, hour) %>%
  mutate_at(.vars = vars(year, month, day,), as.integer) %>%
  select(date, year, month, day, hour, minute, everything()) %>%
  select(-time) %>%
  # Fill in missing data with NA
  mutate_all(.funs = function (x) ifelse(x == -9999, NA, x)) %>%
  # Mask negative short-wave radiation values
  mutate(Rg = ifelse(Rg < 0, NA, Rg)) %>%
  fConvertTimeToPosix(
    TFormat = 'YMDHM', Year = 'year', Month = 'month', Day = 'day', Hour = 'hour', Min = 'minute')

We can confirm the first few rows (at least) are half-hourly:

head(df)
> head(df)
             DateTime  date year month day hour minute NEE Rg Tair VPD Ustar
1 2018-01-01 00:30:00 17532 2018     1   1    0     30  NA NA   NA  NA    NA
2 2018-01-01 00:00:00 17532 2018     1   1    0      0  NA NA   NA  NA    NA
3 2018-01-01 01:30:00 17532 2018     1   1    1     30  NA NA   NA  NA    NA
4 2018-01-01 01:00:00 17532 2018     1   1    1      0  NA NA   NA  NA    NA
5 2018-01-01 02:30:00 17532 2018     1   1    2     30  NA NA   NA  NA    NA
6 2018-01-01 02:00:00 17532 2018     1   1    2      0  NA NA   NA  NA    NA

However, when I try to create a instance of sEddyProc...

sEddyProc$new('US-CS1', df)

I get the error message:

Error in fCheckHHTimeSeries(Data[[ColPOSIXTime]], DTS = DTS, "sEddyProc.initialize") : 
  sEddyProc.initialize:::fCheckHHTimeSeries::: Time stamp is not equidistant (half-)hours in rows: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 2
arthur-e commented 2 years ago

It seems that the input data frame has to be arranged in chronological order. I was really frustrated with all the seemingly arbitrary requirements (e.g., days must start on 30 minutes after the hour for half-hourly data), so I hacked this together a little too quickly. It seems the easiest fix is to use lubridate to add 30 minutes to each time stamp.

  ...
  arrange(DateTime) %>%
  fConvertTimeToPosix(
    TFormat = 'YMDHM', Year = 'year', Month = 'month', Day = 'day', Hour = 'hour', Min = 'minute') %>%
  mutate(DateTime = DateTime + lubridate::dminutes(30))