Closed cbuelo closed 6 years ago
@jzwart @rBatt @anyoneelse, want to check my logic in the commit comments I added?
Regarding all points except the GPP <- NEP + R
, I think much of this comes down to whether calculating 1 day's metabolism from hourly data should involve 24 or 25 observations. Jake and I seem to think only 1 midnight should be expected, whereas Jordan and Luke find 2 midnights to be more intuitive. Is that the crux of the issue?
The problem then manifests as whether or not freq
should reflect the number of "deltas", or the number of observations. There are two basic situations where we use freq
.
R <- mean(nep.night, na.rm=TRUE) * freq
. gas.flux <- (do.sat - do.obs) * (k.gas/freq) / z.mix
. Do those two situations both need the same convention for freq
?
I think they use the same convention. For each, they represent splitting the total change over a day (24 hour period) into a bunch of smaller changes. Because we're talking about quantifying (and splitting up) the change in DO at all these points, frequency should be defined as the number of deltaDO values we have.
For example, imagine a lake with no gas flux. NEP for a day would be change from the first observation to the last observation. Why would we pick 11PM as the last observation?
I just read through all of this again. I'm still of the philosophy that estimates labeled as being from DoY X should be derived only from observations made on DoY X, even though observations on DoY X+1 (or, ecologically speaking, days prior as well) may inform processes that occurred on DoY X.
I agree that the extra observation provides relevant information. Also, I agree with Jake that a user might desire a more general way of defining the time period over which metabolism is calculated.
"1 day of observations, 1 day of estimates". This is easy to understand, and easy to implement (applying a function to each unique calendar day, e.g.). It's the simplest way to define discrete time periods for these analyses. Estimates based on several days, start and end points that aren't midnight-based, etc, are all reasonable extensions. v2.0 imo
I received an email today that brought up this issue again and I had thought we had fixed this but now realize this is still open. The user is getting results that show GPP+R != NEP
, or what @cbuelo initially raised here.
Seems like we have a consensus on GPP = NEP - R
, but need to decide on whether or not freq
should be nobs
or number of 'deltas'. @lawinslow @rBatt any opinions? After reading through this again, I think I prefer the midnight to midnight approach, but it might be nice to have later versions of LM that have different fitting options - but that can be later.
From the bleacher seats here, but the convention is normally…
NEP = GPP – R
Paul Hanson Distinguished Research Professor UW Center for Limnology
From: Jake Zwart notifications@github.com<mailto:notifications@github.com> Reply-To: GLEON/LakeMetabolizer reply@reply.github.com<mailto:reply@reply.github.com> Date: Monday, March 12, 2018 at 8:40 PM To: GLEON/LakeMetabolizer LakeMetabolizer@noreply.github.com<mailto:LakeMetabolizer@noreply.github.com> Cc: Subscribed subscribed@noreply.github.com<mailto:subscribed@noreply.github.com> Subject: Re: [GLEON/LakeMetabolizer] potential metab.bookkeep fix (#121)
I received an email today that brought up this issue again and I had thought we had fixed this but now realize this is still open. The user is getting results that show GPP+R != NEP, or what @cbuelohttps://github.com/cbuelo initially raised herehttps://gist.github.com/cbuelo/20880449283fc3da6bec1f4e6cef1bb2.
Seems like we have a consensus on GPP = NEP - R, but need to decide on whether or not freq should be nobs or number of 'deltas'. @lawinslowhttps://github.com/lawinslow @rBatthttps://github.com/rbatt any opinions? After reading through this again, I think I prefer the midnight to midnight approach, but it might be nice to have later versions of LM that have different fitting options - but that can be later.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/GLEON/LakeMetabolizer/pull/121#issuecomment-372527495, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAY8fel_8-WB644G-vf-8tE37hyycJHKks5tdzGRgaJpZM4Jvljp.
Right @pchanson. Sorry I should've clarified but I meant how it's coded, given that R is negative in LakeMetabolizer R <- mean(nep.night, na.rm=TRUE) * freq # should be negative
Ah, got it, thanks
Sent from my U.S. Cellular® Smartphone
-------- Original message -------- From: Jake Zwart notifications@github.com Date: 3/13/18 6:41 AM (GMT-06:00) To: GLEON/LakeMetabolizer LakeMetabolizer@noreply.github.com Cc: Paul Hanson pchanson@wisc.edu, Mention mention@noreply.github.com Subject: Re: [GLEON/LakeMetabolizer] potential metab.bookkeep fix (#121)
Right @pchansonhttps://github.com/pchanson. Sorry I should've clarified but I meant how it's coded, given that R is negative in LakeMetabolizer R <- mean(nep.night, na.rm=TRUE) * freq # should be negative
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/GLEON/LakeMetabolizer/pull/121#issuecomment-372636793, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAY8fYYD-f2VBA44d0slnhVI_sNSnwjUks5td7BOgaJpZM4Jvljp.
Any more thoughts on this PR and these issues with bookkeeping?
I forgot about this discussion!
I'll summarize my thoughts in bullet form.
metab
should be altered so that it can supply midnight to midnight data (but only if requested, should probably be an option to enable backwards compatibility)
2.2 - Create options for metab.bookkeep
to give users option of how GPP/NEP/R are calculated.
2.3 - Update documentation to reflect these changes. How does that sound?
That sounds good to me but I'm a little unclear on your 2.2 point.
There is more than one way to calculate GPP, R, and NEP for bookkeeping. Should we give people the option of how it is calculated?
TLDR:
freq=nobs-1
(which I dub Cal 1); it isn't necessary, and messes up the units
GPP=NEP-R
NEP-R
, I'm not sure how to make sure that NEP = GPP + R
. Sorry.I put a good-faith effort into thinking this through ... more carefully than in the past. But I had to call it quits at some point, so errors may remain; my apologies if mistakes were made. See below for my logic, and examples.
Sorry for slow response. I just read through all the comments again, and looked back at @cbuelo 's original gist that I think prompted this pull request and discussion. It took a while to do that, let alone think it all through again.
First, I'll start by saying that I think there's something here that needs to be fixed.
Second, there seem to be three-four possible issues/ things to change (see TLDR above):
freq=nobs-1
NEP=GPP+R
As a way of considering these options and what might be wrong, I think it'd be great to return to Cal's original example. He had a great figure of a low-sampling-frequency DO time series to illustrate his concerns:
Some notes about the example:
In Cal's example, we see that oxygen in the last observation is the same as the first. Furthermore, the rate of O_2 change during the day is the same as at night.
Importantly, Cal's PR includes TWO changes:
freq = nobs-1
Let's look at the results when we don't make Cal's changes, when we make each of them individually, and when we make both changes together:
GPP | R | NEP | |
---|---|---|---|
Lake Metabolizer | 7 | -5 | 0 |
Cal1 (freq = nobs-1 ) |
6 | -4 | 0 |
Cal2 (GPP from daytime R) | 4 | -5 | 0 |
Cal1 & Cal2 | 4 | -4 | 0 |
Relative to the original LM estimates, we can see that (1) NEP remains the same across estimates; (2) Cal2 (calculating GPP from daytime R instead of full-day R) only affects GPP; and (3) Cal1 (freq=nobs-1
) affects both GPP and R. The two changes also interact with each other with respect to their impact on GPP (C1 reduces GPP by 1, C2 by 3, but C1+C2 reduce by 3 [not 4]).
Because "Cal 2" has a more isolated effect (calculating GPP from daytime R instead of nighttime R) than "Cal 1", we'll consider the former first. If we calculate GPP using only daytime NEP (which makes sense, and we do this), then we should also calculate GPP using only daytime R (which Cal 2 suggests we do). I agree with change Cal 2. (@lawinslow I used to hold a more nuanced perspective, but I've ~abandoned it.)
If we now assume that Cal 2 is a change that should be made, let's consider the 3rd row of that table (GPP=4, R=-5, NEP=0) as our 'default' scenario. At first glance, some may find these results perplexing because, here, GPP != NEP - R
. The Cal 1 change (freq=nobs-1
) would alter the result so that this statement would be true. Furthermore, the first and last 'observation' of the DO time series (in the figure above) are equal to each other, which suggests that NEP=0.
As Cal pointed out, the issue with freq
is that it represents the number of observations the user supplies per day, not the number of 'first-differences' of those observations. One way to reconcile this difference is to subtract 1 from the current value of freq
, as Cal did. However, we should consider what happens to the units of our metabolism estimates when we do this. Say we have a 4-hour sampling interval. That would give us 6 measurements per day, or nobs=freq=6
. We would then have freq-1=5
first-differences. Empirically, we'd have 5 time steps of 'NEP' values in units of something like "oxygen per 4 hours". In the end, our metabolism estimates are scaled to a 24-hour period. If we multiply "oxygen per 4 hours" by 5 (freq-1
), we have oxygen per 20 hours; if we multiply it by 6 (freq
), we have oxygen per 24 hours = oxygen per day. Clearly, freq-1
doesn't make sense when we calculate daily rates of R and NEP (see here for Cal's handy extraction of these steps in his example). [Note: the units issue of freq
also surfaces in adjusting units of K from daily to per time step, here]
If we had a 4-hr time step, we'd presumably have observations at 1) 00:00, 2) 04:00, 3) 08:00, 4) 12:00, 5) 16:00, and 6) 20:00. In this example, the problem we'd experience w/ freq
arises b/c freq
should actually represent the number of "mass per 4 hours" rates we have (this has been stated differently in earlier posts, and in this one, as "the number of first-differences"). As a result, it was previously suggested that that we could leave freq
the same, but increase our observations (i.e., to nobs=freq+1
) by adding a 7th observation at 24:00 (i.e., 00:00 of the next day). At this point, let's consider what Cal's figure would look like if we added that last observation:
Aha! With this time series, we'd still have the original value of freq
as calculated by calc.freq
(though note in the figure freq=5
, whereas I used freq=6
in the paragraph above), and using visual inspection and intuition, the exact GPP (4) and R (-5) values depicted in the table above for the "Cal 1" scenario: there are 3 nighttime intervals where oxygen decreases by 1, and 2 daytime intervals when oxygen increases by 1. So dO/dt at night = -1, NEP_day = 2 (dO over 2 time steps); this means that R_day = -2, and therefore that GPP_day = GPP = NEP_day - R_day = 4. Similarly, R_night = dO/dt_night 3 = -3 (dO over 3 time steps), and R = R_day + R_night = -2 + -3 = -5, which is the same as dO/dt_night freq = -1 * 5 = -5. Of critical importance is that last step: I can calculate R not just by adding together the nighttime and daytime R values, but by multiplying the average rate by freq
(the original freq
). In other words, and more to the point, I don't need the observation at the 'midnight' (24:00 or 00:00) of the following day (time step 6) to get the correct R value and to have freq
'make sense' ... I can take the rate we had between time steps 1 and 2, and between 4 and 5 (ignoring the 5 to 6), to calculate NEP_night, which can then be multiplied by freq
to give me 24-hour R. (Bonus: Another reason not to include the second midnight as that we'd be re-using the same data point when calculating metabolism over multiple days ... statistically, this observation shouldn't be given extra weight in our calculations.)
Therefore, I do not think we need to make change Cal 1. Furthermore, I do not think we need to include the midnight-to-midnight scenario wherein there would be 25 observations per day for hourly data.
The only point I've left unresolved is that NEP != GPP + R
, or maybe this would be better stated as GPP!=NEP-R
, because GPP is the most derived estimate there (well, NEP is the least derived). Other than just saying GPP = NEP - R
(not all the 'during day' stuff), I'm not sure what to suggest. Actually, here's the suggestion: don't do Cal 2, just set GPP to 24-hr-NEP minus 24-hr-R. (@lawinslow I believe this is what I was suggesting with my 'nuanced view' of how I think GPP should be calculated; we discussed this about 4.5 years ago, I'm sure you remember :p)
My 2 cents:
It seems like everyone agrees on the change to the GPP calculation (what Ryan dubbed "Cal 2") and that's probably the most important one. I think LakeMetabolizer returns inflated GPP estimates as it's including the nighttime indices in the calculation. I checked this by rerunning calculations on some "real" data (5 minute observations from a reservoir) and compared the GPP values. This is just making the change to the GPP calculation, not the freq=nobs-1
change or midnight-to-midnight.
I think I follow Ryan's 'Midnight to Midnight' example above and agree that for the current implementation (without the next day's 24:00/00:00 observation) it should be freq=nobs
.
For freq=nobs-1
I think it boils down to if the next day's midnight should be included or not. Ignoring how it's currently implemented and just thinking conceptually, I think it should be. Taking the 4 hour sampling interval example, if you assign the 00:00 - 20:00 measurements to day X, then the next measurement at 24:00/00:00 would be assigned to day X + 1. But the change in DO from 20:00 - 24:00 that metabolism is calculated from is all occurring on day X; if you don't include the next day's 24:00/00:00 you're throwing all that out. I don't think re-using that point should be an issue; yes that point gets re-used but the deltaDO values on either side of it are what goes into the calculation.
All that said, implementing midnight-to-midnight might get complicated, especially if you don't have a measurement exactly at 00:00. It also should make less and less of a difference as sampling frequency increases.
Generally agreed with the points being made. One thing that is very clear to me now is that this model isn't input-data agnostic, and that your interpretation of the output somewhat depends on your inputs (moreso than other models here).
In the spirit of moving this forward, I propose we incorporate the Cal2
change. And then, make an issue referencing this PR indicating that metab.bookkeep
currently doesn't use both midnights, and that some might interpret that as a better representation. But to do that, we would need to change both this code, the metab
code, and update documentation to make it clear.
How does that sound? @cbuelo if you are on board with this, could you modify your PR to include just the one change, and we'll merge that in.
I agree and thanks for the discussion and explanation @rBatt @cbuelo @lawinslow.
In the spirit of moving this forward, I propose we incorporate the Cal2 change.
Yup I agree.
@cbuelo Would you be able to modify your PR to represent just the Cal2 change?
Just to clarify 1 point (that isn't critical at this juncture): if you want NEP = GPP + R, then I think you (I, we) should calculate GPP as NEP - R.
Sometimes people don't like this approach because they only want to use daytime NEP and daytime R to calculate GPP, because GPP should only happen in the daytime.
The way I see it, you could argue that increase in oxygen at night are random processes (mean 0) that occur throughout the 24 hour cycle, and there should be, with large sample size just as many noise-induced increases as there are noise-induced decreases, both during the day and at night. So maybe just use all the data, and hope it balances out better.
On Thu, Mar 29, 2018 at 1:04 PM, Jake Zwart notifications@github.com wrote:
I agree and thanks for the discussion and explanation @rBatt https://github.com/rBatt @cbuelo https://github.com/cbuelo @lawinslow https://github.com/lawinslow.
In the spirit of moving this forward, I propose we incorporate the Cal2 change. Yup I agree.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GLEON/LakeMetabolizer/pull/121#issuecomment-377303657, or mute the thread https://github.com/notifications/unsubscribe-auth/AFXrfzTT3Vy2MDlqkB0QNoottK7V8Lvuks5tjRQQgaJpZM4Jvljp .
-- Ryan D. Batt, Ph.D. Aquatic Ecologist Postdoctoral Fellow, National Academy of Sciences, US EPA Website: batt.limnology.wisc.edu Pronouns: he/ him
Alright I made the changes in my fork, which I think updates the pull request? Let me know if I need to do anything else.
:+1: Thanks @cbuelo
We should throw a party for having this PR merged ...
On Fri, Mar 30, 2018 at 12:37 PM, Luke Winslow notifications@github.com wrote:
Merged #121 https://github.com/GLEON/LakeMetabolizer/pull/121.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GLEON/LakeMetabolizer/pull/121#event-1549994691, or mute the thread https://github.com/notifications/unsubscribe-auth/AFXrf1LpJNRP2d_4SKHHn_OHkCkx3Sakks5tjl9CgaJpZM4Jvljp .
-- Ryan D. Batt, Ph.D. Aquatic Ecologist Postdoctoral Fellow, National Academy of Sciences, US EPA Website: batt.limnology.wisc.edu Pronouns: he/ him
:balloon: :birthday: :gift: :tada: !! :100:
Subtract out daytime respiration instead of total daily respiration when calculating GPP.
Subtract one from freq to account for differencing DO time series.
Example illustrating issue here.
@rBatt @lawinslow