GLEON / LakeMetabolizer

Collection of Lake Metabolism Functions
16 stars 10 forks source link

potential metab.bookkeep fix #121

Closed cbuelo closed 6 years ago

cbuelo commented 8 years ago

Subtract out daytime respiration instead of total daily respiration when calculating GPP.

Subtract one from freq to account for differencing DO time series.

Example illustrating issue here.

@rBatt @lawinslow

lawinslow commented 8 years ago

@jzwart @rBatt @anyoneelse, want to check my logic in the commit comments I added?

rBatt commented 8 years ago

Regarding all points except the GPP <- NEP + R, I think much of this comes down to whether calculating 1 day's metabolism from hourly data should involve 24 or 25 observations. Jake and I seem to think only 1 midnight should be expected, whereas Jordan and Luke find 2 midnights to be more intuitive. Is that the crux of the issue?

The problem then manifests as whether or not freq should reflect the number of "deltas", or the number of observations. There are two basic situations where we use freq.

Do those two situations both need the same convention for freq?

lawinslow commented 8 years ago

I think they use the same convention. For each, they represent splitting the total change over a day (24 hour period) into a bunch of smaller changes. Because we're talking about quantifying (and splitting up) the change in DO at all these points, frequency should be defined as the number of deltaDO values we have.

For example, imagine a lake with no gas flux. NEP for a day would be change from the first observation to the last observation. Why would we pick 11PM as the last observation?

rBatt commented 7 years ago

I just read through all of this again. I'm still of the philosophy that estimates labeled as being from DoY X should be derived only from observations made on DoY X, even though observations on DoY X+1 (or, ecologically speaking, days prior as well) may inform processes that occurred on DoY X.

I agree that the extra observation provides relevant information. Also, I agree with Jake that a user might desire a more general way of defining the time period over which metabolism is calculated.

"1 day of observations, 1 day of estimates". This is easy to understand, and easy to implement (applying a function to each unique calendar day, e.g.). It's the simplest way to define discrete time periods for these analyses. Estimates based on several days, start and end points that aren't midnight-based, etc, are all reasonable extensions. v2.0 imo

jzwart commented 6 years ago

I received an email today that brought up this issue again and I had thought we had fixed this but now realize this is still open. The user is getting results that show GPP+R != NEP, or what @cbuelo initially raised here.

Seems like we have a consensus on GPP = NEP - R, but need to decide on whether or not freq should be nobs or number of 'deltas'. @lawinslow @rBatt any opinions? After reading through this again, I think I prefer the midnight to midnight approach, but it might be nice to have later versions of LM that have different fitting options - but that can be later.

pchanson commented 6 years ago

From the bleacher seats here, but the convention is normally…

NEP = GPP – R


Paul Hanson Distinguished Research Professor UW Center for Limnology

From: Jake Zwart notifications@github.com<mailto:notifications@github.com> Reply-To: GLEON/LakeMetabolizer reply@reply.github.com<mailto:reply@reply.github.com> Date: Monday, March 12, 2018 at 8:40 PM To: GLEON/LakeMetabolizer LakeMetabolizer@noreply.github.com<mailto:LakeMetabolizer@noreply.github.com> Cc: Subscribed subscribed@noreply.github.com<mailto:subscribed@noreply.github.com> Subject: Re: [GLEON/LakeMetabolizer] potential metab.bookkeep fix (#121)

I received an email today that brought up this issue again and I had thought we had fixed this but now realize this is still open. The user is getting results that show GPP+R != NEP, or what @cbuelohttps://github.com/cbuelo initially raised herehttps://gist.github.com/cbuelo/20880449283fc3da6bec1f4e6cef1bb2.

Seems like we have a consensus on GPP = NEP - R, but need to decide on whether or not freq should be nobs or number of 'deltas'. @lawinslowhttps://github.com/lawinslow @rBatthttps://github.com/rbatt any opinions? After reading through this again, I think I prefer the midnight to midnight approach, but it might be nice to have later versions of LM that have different fitting options - but that can be later.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/GLEON/LakeMetabolizer/pull/121#issuecomment-372527495, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAY8fel_8-WB644G-vf-8tE37hyycJHKks5tdzGRgaJpZM4Jvljp.

jzwart commented 6 years ago

Right @pchanson. Sorry I should've clarified but I meant how it's coded, given that R is negative in LakeMetabolizer R <- mean(nep.night, na.rm=TRUE) * freq # should be negative

pchanson commented 6 years ago

Ah, got it, thanks

Sent from my U.S. Cellular® Smartphone

-------- Original message -------- From: Jake Zwart notifications@github.com Date: 3/13/18 6:41 AM (GMT-06:00) To: GLEON/LakeMetabolizer LakeMetabolizer@noreply.github.com Cc: Paul Hanson pchanson@wisc.edu, Mention mention@noreply.github.com Subject: Re: [GLEON/LakeMetabolizer] potential metab.bookkeep fix (#121)

Right @pchansonhttps://github.com/pchanson. Sorry I should've clarified but I meant how it's coded, given that R is negative in LakeMetabolizer R <- mean(nep.night, na.rm=TRUE) * freq # should be negative

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/GLEON/LakeMetabolizer/pull/121#issuecomment-372636793, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAY8fYYD-f2VBA44d0slnhVI_sNSnwjUks5td7BOgaJpZM4Jvljp.

jzwart commented 6 years ago

Any more thoughts on this PR and these issues with bookkeeping?

lawinslow commented 6 years ago

I forgot about this discussion!

I'll summarize my thoughts in bullet form.

  1. I think we should merge this PR. Note, this will change the results for people when they update.
  2. We should then setup issues to deal with the lingering issues. Lingering issues as I see them: 2.1 - metab should be altered so that it can supply midnight to midnight data (but only if requested, should probably be an option to enable backwards compatibility) 2.2 - Create options for metab.bookkeep to give users option of how GPP/NEP/R are calculated. 2.3 - Update documentation to reflect these changes.

How does that sound?

jzwart commented 6 years ago

That sounds good to me but I'm a little unclear on your 2.2 point.

lawinslow commented 6 years ago

There is more than one way to calculate GPP, R, and NEP for bookkeeping. Should we give people the option of how it is calculated?

rBatt commented 6 years ago

TLDR:

  1. Don't make the change of freq=nobs-1 (which I dub Cal 1); it isn't necessary, and messes up the units
    • related: don't make the change of include the next day's midnight; it isn't necessary. But happy to discuss.
  2. Make the change of calculating GPP from R_day and NEP_day (which I dub Cal 2); it makes sense, though can cause problems if we want GPP=NEP-R
  3. Other than just calculating GPP as NEP-R, I'm not sure how to make sure that NEP = GPP + R. Sorry.

I put a good-faith effort into thinking this through ... more carefully than in the past. But I had to call it quits at some point, so errors may remain; my apologies if mistakes were made. See below for my logic, and examples.


Sorry for slow response. I just read through all the comments again, and looked back at @cbuelo 's original gist that I think prompted this pull request and discussion. It took a while to do that, let alone think it all through again.

First, I'll start by saying that I think there's something here that needs to be fixed.

Second, there seem to be three-four possible issues/ things to change (see TLDR above):

  1. freq=nobs-1
  2. 'midnight to midnight' (25 obs for a day of hourly data)
  3. calculate GPP from daytime R, not 24-hr
  4. we (should?) expect NEP=GPP+R

As a way of considering these options and what might be wrong, I think it'd be great to return to Cal's original example. He had a great figure of a low-sampling-frequency DO time series to illustrate his concerns:
calheuristic Some notes about the example:

In Cal's example, we see that oxygen in the last observation is the same as the first. Furthermore, the rate of O_2 change during the day is the same as at night.

Importantly, Cal's PR includes TWO changes:

  1. freq = nobs-1
  2. Calculate GPP from daytime R, not from 24-hr R

Let's look at the results when we don't make Cal's changes, when we make each of them individually, and when we make both changes together:

GPP R NEP
Lake Metabolizer 7 -5 0
Cal1 (freq = nobs-1) 6 -4 0
Cal2 (GPP from daytime R) 4 -5 0
Cal1 & Cal2 4 -4 0

Relative to the original LM estimates, we can see that (1) NEP remains the same across estimates; (2) Cal2 (calculating GPP from daytime R instead of full-day R) only affects GPP; and (3) Cal1 (freq=nobs-1) affects both GPP and R. The two changes also interact with each other with respect to their impact on GPP (C1 reduces GPP by 1, C2 by 3, but C1+C2 reduce by 3 [not 4]).

Because "Cal 2" has a more isolated effect (calculating GPP from daytime R instead of nighttime R) than "Cal 1", we'll consider the former first. If we calculate GPP using only daytime NEP (which makes sense, and we do this), then we should also calculate GPP using only daytime R (which Cal 2 suggests we do). I agree with change Cal 2. (@lawinslow I used to hold a more nuanced perspective, but I've ~abandoned it.)

If we now assume that Cal 2 is a change that should be made, let's consider the 3rd row of that table (GPP=4, R=-5, NEP=0) as our 'default' scenario. At first glance, some may find these results perplexing because, here, GPP != NEP - R. The Cal 1 change (freq=nobs-1) would alter the result so that this statement would be true. Furthermore, the first and last 'observation' of the DO time series (in the figure above) are equal to each other, which suggests that NEP=0.

As Cal pointed out, the issue with freq is that it represents the number of observations the user supplies per day, not the number of 'first-differences' of those observations. One way to reconcile this difference is to subtract 1 from the current value of freq, as Cal did. However, we should consider what happens to the units of our metabolism estimates when we do this. Say we have a 4-hour sampling interval. That would give us 6 measurements per day, or nobs=freq=6. We would then have freq-1=5 first-differences. Empirically, we'd have 5 time steps of 'NEP' values in units of something like "oxygen per 4 hours". In the end, our metabolism estimates are scaled to a 24-hour period. If we multiply "oxygen per 4 hours" by 5 (freq-1), we have oxygen per 20 hours; if we multiply it by 6 (freq), we have oxygen per 24 hours = oxygen per day. Clearly, freq-1 doesn't make sense when we calculate daily rates of R and NEP (see here for Cal's handy extraction of these steps in his example). [Note: the units issue of freq also surfaces in adjusting units of K from daily to per time step, here]

If we had a 4-hr time step, we'd presumably have observations at 1) 00:00, 2) 04:00, 3) 08:00, 4) 12:00, 5) 16:00, and 6) 20:00. In this example, the problem we'd experience w/ freq arises b/c freq should actually represent the number of "mass per 4 hours" rates we have (this has been stated differently in earlier posts, and in this one, as "the number of first-differences"). As a result, it was previously suggested that that we could leave freq the same, but increase our observations (i.e., to nobs=freq+1) by adding a 7th observation at 24:00 (i.e., 00:00 of the next day). At this point, let's consider what Cal's figure would look like if we added that last observation:

lm_github_annoying_resolved

Aha! With this time series, we'd still have the original value of freq as calculated by calc.freq (though note in the figure freq=5, whereas I used freq=6 in the paragraph above), and using visual inspection and intuition, the exact GPP (4) and R (-5) values depicted in the table above for the "Cal 1" scenario: there are 3 nighttime intervals where oxygen decreases by 1, and 2 daytime intervals when oxygen increases by 1. So dO/dt at night = -1, NEP_day = 2 (dO over 2 time steps); this means that R_day = -2, and therefore that GPP_day = GPP = NEP_day - R_day = 4. Similarly, R_night = dO/dt_night 3 = -3 (dO over 3 time steps), and R = R_day + R_night = -2 + -3 = -5, which is the same as dO/dt_night freq = -1 * 5 = -5. Of critical importance is that last step: I can calculate R not just by adding together the nighttime and daytime R values, but by multiplying the average rate by freq (the original freq). In other words, and more to the point, I don't need the observation at the 'midnight' (24:00 or 00:00) of the following day (time step 6) to get the correct R value and to have freq 'make sense' ... I can take the rate we had between time steps 1 and 2, and between 4 and 5 (ignoring the 5 to 6), to calculate NEP_night, which can then be multiplied by freq to give me 24-hour R. (Bonus: Another reason not to include the second midnight as that we'd be re-using the same data point when calculating metabolism over multiple days ... statistically, this observation shouldn't be given extra weight in our calculations.)

Therefore, I do not think we need to make change Cal 1. Furthermore, I do not think we need to include the midnight-to-midnight scenario wherein there would be 25 observations per day for hourly data.

The only point I've left unresolved is that NEP != GPP + R, or maybe this would be better stated as GPP!=NEP-R, because GPP is the most derived estimate there (well, NEP is the least derived). Other than just saying GPP = NEP - R (not all the 'during day' stuff), I'm not sure what to suggest. Actually, here's the suggestion: don't do Cal 2, just set GPP to 24-hr-NEP minus 24-hr-R. (@lawinslow I believe this is what I was suggesting with my 'nuanced view' of how I think GPP should be calculated; we discussed this about 4.5 years ago, I'm sure you remember :p)

cbuelo commented 6 years ago

My 2 cents:

lawinslow commented 6 years ago

Generally agreed with the points being made. One thing that is very clear to me now is that this model isn't input-data agnostic, and that your interpretation of the output somewhat depends on your inputs (moreso than other models here).

In the spirit of moving this forward, I propose we incorporate the Cal2 change. And then, make an issue referencing this PR indicating that metab.bookkeep currently doesn't use both midnights, and that some might interpret that as a better representation. But to do that, we would need to change both this code, the metab code, and update documentation to make it clear.

How does that sound? @cbuelo if you are on board with this, could you modify your PR to include just the one change, and we'll merge that in.

jzwart commented 6 years ago

I agree and thanks for the discussion and explanation @rBatt @cbuelo @lawinslow.

In the spirit of moving this forward, I propose we incorporate the Cal2 change.

Yup I agree.

lawinslow commented 6 years ago

@cbuelo Would you be able to modify your PR to represent just the Cal2 change?

rBatt commented 6 years ago

Just to clarify 1 point (that isn't critical at this juncture): if you want NEP = GPP + R, then I think you (I, we) should calculate GPP as NEP - R.

Sometimes people don't like this approach because they only want to use daytime NEP and daytime R to calculate GPP, because GPP should only happen in the daytime.

The way I see it, you could argue that increase in oxygen at night are random processes (mean 0) that occur throughout the 24 hour cycle, and there should be, with large sample size just as many noise-induced increases as there are noise-induced decreases, both during the day and at night. So maybe just use all the data, and hope it balances out better.

On Thu, Mar 29, 2018 at 1:04 PM, Jake Zwart notifications@github.com wrote:

I agree and thanks for the discussion and explanation @rBatt https://github.com/rBatt @cbuelo https://github.com/cbuelo @lawinslow https://github.com/lawinslow.

In the spirit of moving this forward, I propose we incorporate the Cal2 change. Yup I agree.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GLEON/LakeMetabolizer/pull/121#issuecomment-377303657, or mute the thread https://github.com/notifications/unsubscribe-auth/AFXrfzTT3Vy2MDlqkB0QNoottK7V8Lvuks5tjRQQgaJpZM4Jvljp .

-- Ryan D. Batt, Ph.D. Aquatic Ecologist Postdoctoral Fellow, National Academy of Sciences, US EPA Website: batt.limnology.wisc.edu Pronouns: he/ him

cbuelo commented 6 years ago

Alright I made the changes in my fork, which I think updates the pull request? Let me know if I need to do anything else.

lawinslow commented 6 years ago

:+1: Thanks @cbuelo

rBatt commented 6 years ago

We should throw a party for having this PR merged ...

On Fri, Mar 30, 2018 at 12:37 PM, Luke Winslow notifications@github.com wrote:

Merged #121 https://github.com/GLEON/LakeMetabolizer/pull/121.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GLEON/LakeMetabolizer/pull/121#event-1549994691, or mute the thread https://github.com/notifications/unsubscribe-auth/AFXrf1LpJNRP2d_4SKHHn_OHkCkx3Sakks5tjl9CgaJpZM4Jvljp .

-- Ryan D. Batt, Ph.D. Aquatic Ecologist Postdoctoral Fellow, National Academy of Sciences, US EPA Website: batt.limnology.wisc.edu Pronouns: he/ him

lawinslow commented 6 years ago

:balloon: :birthday: :gift: :tada: !! :100: