Closed lsigut closed 3 years ago
When dealing with empirical autocorrelation, it is important to keep the equidistant time steps. Filtering the data before will most probably introduce some faults. Do you work with equidistant steps in your results using base functions?
Yes, the functions are now available in openeddy. You can check openeddy::agg_fsd
and openeddy::agg_DT_SD
functions and documentation in the package github dev version. They are finished now. The regular timestamp is required, only the measured data are used for nEff
estimation. The only problem is that in some cases there is not enough values per aggregation period. In that case NA
s are produced. The main advantage of those functions is that they evaluate uncertainty for all variables available using autodetection. That way you provide REddyProc full output and extract uncertainties for all meteo and uStar scenarios at once for given aggregation interval (see the examples in documentation).
Do you have any specific suggestions, what should be still done?
Does not seem to me that the vignette code changed. If I remember, maybe it requires two separate pipes. One for nEff, second for fsd aggregation. I am not familiar with pipes though.
@lsigut: I updated the aggregation vignette
Can you, please, have a look at commit 9b929cef7dc1656e99525e621aab73e5fb8cc38d and see if this resolved the disagreement with your calculations.
The fix is not successful because the quality filter does not work as intended. See that the outcome is actually identical to not including it at all:
results <- EProc$sExportResults() %>%
mutate(resid = ifelse(NEE_uStar_fqc == 0, NEE_uStar_orig - NEE_uStar_fall, NA)) %>%
mutate(resid2 = NEE_uStar_orig - NEE_uStar_fall)
identical(results$resid, results$resid2[1:17520])
# resid2 has only additional attributes (stripped here by subsetting)
When comparing to your partial results on the daily scale, I have identical nEff
. Therefore the problem must be that QC should be done only on _fsd
columns (e.g. NEE_uStar_fsd
). The problem can be better understood if you check header(results)
. Since NEE_uStar_orig
is already effectively filtered for quality, it will also affect the product of NEE_uStar_orig - NEE_uStar_fall
. Additionally, if you check aggDay
, our results are identical at 6th and 7th row, where nRec == 48
, i.e. no QC filtering needed.
Here is a more complete code where you can check the differences:
library(REddyProc)
library(dplyr)
library(lognorm)
library(openeddy)
EddyDataWithPosix <- Example_DETha98 %>%
filterLongRuns("NEE") %>%
fConvertTimeToPosix('YDH',Year = 'Year',Day = 'DoY', Hour = 'Hour')
EProc <- sEddyProc$new(
'DE-Tha', EddyDataWithPosix, c('NEE','Rg','Tair','VPD', 'Ustar'))
EProc$sMDSGapFillAfterUstar('NEE', uStarTh = 0.3, FillAll = TRUE)
results <- EProc$sExportResults()
summary(results$NEE_uStar_fsd)
results %>% filter(NEE_uStar_fqc == 0) %>% summarise(
nRec = sum(is.finite(NEE_uStar_f))
, varSum = sum(NEE_uStar_fsd^2, na.rm = TRUE)
, seMean = sqrt(varSum) / nRec
, seMeanApprox = mean(NEE_uStar_fsd, na.rma = TRUE) / sqrt(nRec)
) %>% select(nRec, seMean, seMeanApprox)
# this step can be removed (or better amended to filter NEE_uStar_fsd instead)
results <- EProc$sExportResults() %>%
mutate(
resid = ifelse(NEE_uStar_fqc == 0, NEE_uStar_orig - NEE_uStar_fall, NA )
)
autoCorr <- computeEffectiveAutoCorr(results$resid)
nEff <- computeEffectiveNumObs(results$resid, na.rm = TRUE)
c( nEff = nEff, nObs = sum(is.finite(results$resid)))
results %>% filter(NEE_uStar_fqc == 0) %>% summarise(
nRec = sum(is.finite(NEE_uStar_fsd))
, varMean = sum(NEE_uStar_fsd^2, na.rm = TRUE) / nRec / (!!nEff - 1)
, seMean = sqrt(varMean)
#, seMean2 = sqrt(mean(NEE_uStar_fsd^2, na.rm = TRUE)) / sqrt(!!nEff - 1)
, seMeanApprox = mean(NEE_uStar_fsd, na.rm = TRUE) / sqrt(!!nEff - 1)
) %>% select(seMean, seMeanApprox)
results <- results %>% mutate(
DateTime = EddyDataWithPosix$DateTime
, DoY = as.POSIXlt(DateTime - 15*60)$yday # midnight belongs to the previous
)
# openeddy results for comparison (yearly scale):
results$timestamp <- results$DateTime - 60*15
year <- agg_fsd(results, "%Y", agg_per = "year-1")
year
aggDay <- results %>% group_by(DoY) %>%
summarise(
DateTime = first(DateTime)
, nRec = sum( NEE_uStar_fqc == 0, na.rm = TRUE)
, nEff = computeEffectiveNumObs(
resid, effAcf = !!autoCorr, na.rm = TRUE)
, NEE = mean(NEE_uStar_f, na.rm = TRUE)
, sdNEE = if (nEff <= 1) NA_real_ else sqrt(
mean(NEE_uStar_fsd^2, na.rm = TRUE) / (nEff - 1))
, sdNEEuncorr = if (nRec == 0) NA_real_ else sqrt(
mean(NEE_uStar_fsd^2, na.rm = TRUE) / (nRec - 1))
)
aggDay
# openeddy results for comparison (daily scale):
results$timestamp <- results$DateTime - 60*15
day <- agg_fsd(results, "%Y-%m-%d", agg_per = "day-1")
plot(day$mean$NEE_uStar_fsd_mean, aggDay$sdNEE)
day$mean[1:10, "NEE_uStar_fsd_mean"]
aggDay$sdNEE[1:10]
lines(0:6, 0:6)
For computing mean and sd we need to filter for the quality flag. However, for computing the effective number of observations we are not allowed to remove records from the time series, i.e. filter. Hence, I introduced a bug with the previous commit. Please, have a look at commit fd448e2d60915082857936b5c2e3a8749e9a58a, if this is now correct and better explained.
That is still wrong. Getting into the topic again I realize that we aim for:
I agree that it is a bit tricky and I spent some time to implement it in openeddy based on your advices. I had also hard times with the statistics though, not only with the data wrangling, so thank you for explanations! I think what really helps is that *_orig
is always already QC filtered. So which records are measured and which are "predicted" is already specified and the fact can be used during computations. What I did was to make a single computation algorithm that works for all arbitrary time intervals (including yearly and daily). Then I checked against your results. I think that what I noticed was that my results would agree on yearly estimates, but not daily. It is a long time ago, but I assume that if I changed the algorithm to produce the same results on the daily resolution, I would likely not arrive to same yearly results as yours. But I obviously still cannot say I did not overlook anything 100%. I was just hoping that if both of us use different algorithms and arrive at the same results just agreeing on the principles/requirements, it would validate the approach. I think that the main complication was the requirement to compute autocorrelation on the complete dataset. Another question could be, what if 10 years of data are processed. But I imagine that computing autocorrelation from the whole period will only make the estimate more robust as it should be rather a general property of the type of measurement. Notice I do not want to force any change and I would prefer if you arrived to the same conclusion independently because that would be the real validation. If I understand well, your last comment here is in line with what I was suggesting in my last comment.
I created a version (28505eb08746616c31293c7669b9f1a5c09036d9) where I removed all the filtering but instead created columns with explicit NA in for uncertainty of gap-filled records. This is simpler and hopefully explaining the issues more clearly. I am looking forward to critical feedback.
Thanks for taking time to sit together and confirming that the issue is resolved with the last commit.
Hi, I was comparing with my results using base R functions and could not arrive to those presented for daily aggregation in vignette. After some testing I noticed that
%>% filter(NEE_uStar_fqc == 0)
is missing in the pipe, i.e. all daily values ofNEE_uStar_fsd
are used for aggregation without filtering. I confirmed when I removed the filtering in my code. Unfortunately, the solution is not so easy as including above filter in the pipe will also filternEff
that will again affect the uncertainty results.