DOI-USGS / streamMetabolizer

streamMetabolizer uses inverse modeling to estimate aquatic metabolism (photosynthesis and respiration) from time series data on dissolved oxygen, water temperature, depth, and light.
http://usgs-r.github.io/streamMetabolizer/
Other
36 stars 22 forks source link

wrong DO.sat predictions / positive ER estimates #367

Closed mluerig closed 6 years ago

mluerig commented 6 years ago

I am using 15 min interval environmental time series data to model GPP and ER over several consecutive days (a formatted sample is attached). However, I am getting some really weird results. There are these extreme jumps in ER from negative to positive (ER can't be positive, no?), and also predictions for O2-sat seems off - or is this just a labelling-error (see attached plot - 10-11%)?

mod.out.txt mod.data.txt

I am not quite sure where to start. I have only been playing around with "mle" models, not even sure is my data is fine the way it is. E.g. light - does it have to be smoothed more (I have been using raw)? If you have time to have a look or just point me in the right direction I would really appreciate it.

do_pred

robohall commented 6 years ago

Mortiz,

A few things to try:

  1. DO is saturated at night—this is likely a probe calibration problem, or very high gas exchange supersaturating the O2. If DO > saturated at nigh, you wlll always get positive ER. There is some math to this effect in Hall et al.2015, L&O.
  2. You are getting very small DO diel changes making it difficult to resolve ER and K using a model.
  3. I do not know why the middle panel says DO (%sat), but has values of 10.5. Check to make sure that you are calculating the DO sat correctly. Those values are more in line of what I would expect DO.sat values to be, which in themselves are not that useful to plot.
  4. K values are too low from what I expect in a stream. What do you expect K to be in this stream using e.g. Raymond et al. 2012 equation 7?

Bob

On Apr 20, 2018, at 11:24 AM, Moritz Lürig notifications@github.com<mailto:notifications@github.com> wrote:

I am using 15 min interval environmental time series data to model GPP and ER over several consecutive days (a formatted sample is attached). However, I am getting some really weird results. There are these extreme jumps in ER from negative to positive (ER can't be positive, no?), and also predictions for O2-sat seems off - or is this just a labelling-error (see attached plot - 10-11%)?

mod.out.txthttps://github.com/USGS-R/streamMetabolizer/files/1933003/mod.out.txt mod.data.txthttps://github.com/USGS-R/streamMetabolizer/files/1932967/mod.data.txt

I am not quite sure where to start. I have only been playing around with "mle" models, not even sure is my data is fine the way it is. E.g. light - does it have to be smoothed more (I have been using raw)? If you have time to have a look or just point me in the right direction I would really appreciate it.

[do_pred]https://user-images.githubusercontent.com/15648068/39064767-e896c5b2-44cf-11e8-8e2c-4fc070bbdd52.png

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/USGS-R/streamMetabolizer/issues/367, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ALgqdqlBoOpZ-0O5NZ-dfWKupYcBqs2Iks5tqhm7gaJpZM4Td0Xf.

mluerig commented 6 years ago

Hi Bob,

thanks for your reply.

1)/4)I really should have said that these data do not come from nature but from a dozen experimental ecosystems/mesocosms (1000 L cattle-tanks). For each tank I have five chunks that are each a week long, collected between July and October. These tanks had macrophytes and sometimes a lot of phytoplankton in them. So, for most parts of the summer they were quite productive and often supersaturated with oxygen. I don't really have any expectations for K600 - should be more similar to a lake, right?

2)/3) I did use wrong values for DO.sat. However, fixing that did not really change the diel DO changes - I would think that they are just that low (the data I provided is from October).

I had a look at your L&O paper - I am guessing that the relevant equation for my is case is 5) ? Is it somehow integrated into streamMetabolizer? I also thought more about what I should define as a "day" to calculate GPP and ER from, as so far I only used the standard values (4-28). Would it make sense to align the start/end times with when O2-sat starts to drop and then rise again (e.g., in the case of that sample data ~8:00)? Of course I would have to adjust those times throughout the season...

I'm not that experienced with calculating ecosystem metabolism, so your thoughts are highly appreciated - thanks very much.

Moritz

aappling-usgs commented 6 years ago

Hi Moritz,

Following up on 2)/3): could you share/check your new DO prediction plot against the following? you should be seeing values in that second panel much closer to 100 now that you've updated your DO.sat inputs. and it would help me to see the final data, too. image.

streamMetabolizer does not have a second term for KO_over (eqn 5 from Bob's L&O paper) because we'd need additional information to tease apart ER from KOover. If the nighttime oversaturation is real, you may simply need to interpret the ER values as being ER minus the rate of O2 pushing its way into the water from bubbles. I'd like Bob's take on the comparability of bubbles from rapids and bubbles from gas getting trapped under macrophytes/algae, too - I can't decide whether they would function the same way in the model or not.

I doubt that changing the time window would have a strong effect on your results, but it shouldn't be too costly to run the model with a few different start times to find out for sure. I'd encourage you to try a few different windows for all the dates before trying to customize the window on each date.

Unsmoothed light shouldn't be a problem, though when I plotted your data, I did notice that the light peaks are occurring ~2 hrs after noon, suggesting to me that your solar.time stamps aren't precisely local solar time. This might be part of why you're needing to think about modifying the time windows? image

Alison

robohall commented 6 years ago

Mortitz,

Given that you are in cattle tanks, you can ignore the O_over argument in my L&O paper. It is easy to see why supersaturion occurs in a tank— NEP > G where G is gas exchange flux K*(Osat-O).

But working in cattle tanks you have special problems not found in streams. The big one, is that unless you are physically mixing the tanks, the O2 will not be uniformly distributed in the tank. Thus it may be difficult to estimate metabolism for the entire tank if this is occurring. Streams are nice for metabolism because we assume they are mixed through any given cross-sectional area. That may not be the case in a tank.

Bob On Apr 24, 2018, at 4:31 PM, Alison Appling notifications@github.com<mailto:notifications@github.com> wrote:

Hi Moritz,

Following up on 2)/3): could you share/check your new DO prediction plot against the following? you should be seeing values in that second panel much closer to 100 now that you've updated your DO.sat inputs. and it would help me to see the final data, too. [image]https://user-images.githubusercontent.com/12039957/39216352-218f72d0-47d0-11e8-86b2-e16cb3517c2b.png.

streamMetabolizer does not have a second term for KO_over (eqn 5 from Bob's L&O paper) because we'd need additional information to tease apart ER from KOover. If the nighttime oversaturation is real, you may simply need to interpret the ER values as being ER minus the rate of O2 pushing its way into the water from bubbles. I'd like Bob's take on the comparability of bubbles from rapids and bubbles from gas getting trapped under macrophytes/algae, too - I can't decide whether they would function the same way in the model or not.

I doubt that changing the time window would have a strong effect on your results, but it shouldn't be too costly to run the model with a few different start times to find out for sure. I'd encourage you to try a few different windows for all the dates before trying to customize the window on each date.

Unsmoothed light shouldn't be a problem, though when I plotted your data, I did notice that the light peaks are occurring ~2 hrs after noon, suggesting to me that your solar.time stamps aren't precisely local solar time. This might be part of why you're needing to think about modifying the time windows? [image]https://user-images.githubusercontent.com/12039957/39217346-6b29d85a-47d4-11e8-90c2-09004b2fb5c3.png

Alison

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/USGS-R/streamMetabolizer/issues/367#issuecomment-384100685, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ALgqdgqOTOPllKNt5AxzlS6W2BmwA8WDks5tr6fLgaJpZM4Td0Xf.

mluerig commented 6 years ago

Hi Alison and Bob,

Thanks for your replies. Here is the updated model in- and output:

mod.out1.txt mod.data1.txt

@aappling-usgs: This is what the updated predictions for the example look like (I previously had entered DO.sat as % and not mg/L). O2 sat looks right now.

do pred

I played around with different starting times: in my loop that runs through all the data-chunks I am now selecting for the hour of the day before the first sunlight is detected. In most cases this greatly improved the results, especially for ER. However, sometimes I will get just NAs as output, and when I then change the start time a little in one or the other direction it works. I guess I won't get around specifying manual starting dates.

Independent of the start time, sometimes the predicted values are not great (e.g. in day 1, 2 and 4 of this example, the peaks are "chopped off") and I would like to try to improve the fit. Do you think specifying different initial values would help (as you show in the gpp_er_eqs tutorial), or even using a different model?

About the solar.time: why is it important to use solar time instead of local time? Shouldn't time really depend on what you define as a day, i.e. starting and end times?

Another thing: I noticed that the last day in my time series is always not modeled, I assume because that day is "not complete" (the hours after midnight to complete the window are missing"). Is there any way to solve this?

@robohall: I was afraid you were going to say this. I guess there is no way to really account for this? We put the sondes that measured water parameters in the middle of the tanks (most of the bubbles were on the walls). I always assumed that because it is a relatively small volume, local differences might be existent, but negligible compared to the strong diurnal patterns across all tanks.

Cheers Moritz

mluerig commented 6 years ago

Oh, and is there a way to extract a goodness of fit? This might be a way (for me and in general) to narrow down the daily window selection if one could just pick a range, and then decide by the gof which suits best.

aappling-usgs commented 6 years ago

That's interesting that the start times are affecting your results strongly - thanks for sharing that result. I'm curious about the causes of the NAs - you can probably find explanations in the final columns of predict_metab(mm) or get_params(mm) (I assume you're still using MLE observation-error models?).

Peaks being "chopped off" can sometimes be a result of a mismatch between timestamps of the DO and light data. I can vaguely imagine that this phenomenon could also be explained by a bad estimate of K600, or maybe something about the incomplete mixing or very slow gas exchange in the cattle tank, but my thinking on that is pretty hazy right now. Maybe @robohall can do better?

About solar.time - for you, because you have your own light data, it's only important because it determines where in the daily DO cycle the date windows sit. It's more important if people are modeling rather than measuring light, because then light is a function of time. As noted above, the temporal match between the DO and light curves is important to get GPP right.

Yes, the last day in your time series is likely not being modeled because that day is not complete. A 24-hour period is required to model each day. One option, if you're OK using some hours of data twice, is to use an earlier time window for the final day such that you can fit a full 24 hours in before your data run out.

Goodness of fit is not a foolproof approach because of equifinality in metabolism models, but you could compute RMSE of the DO predictions like so:

library(dplyr)
predict_DO(mm) %>%
  mutate(err_sq = (DO.mod - DO.obs)^2) %>%
  group_by(date) %>%
  summarize(RMSE = sqrt(mean(err_sq)))

But remember that for process-error and state-space models, a close match between observed and predicted oxygen concentrations is not the [only] objective; those models also try to match the [DO] rates of change and have different definitions of DO.mod.

Other ways of assessing model output:

robohall commented 6 years ago

Another thing to consider is priors on K600. I am not sure how you can get negative estimates of K600 with a lognormal prior. For sure set a lognormal prior, but because you are in tanks, K600 should be small.

here is how I would do it

Estimate k600 (m/d) intercept when wind = 0 from all of the windy lake k literature. Divide by tank depth to get K600. Take ln of this value as mean for lognormal prior. Set scale sufficiently large enough to allow a broad range of values, say 0.7 or 1.

Also consider pooling gas exchange among all tanks, regardless of treatment. That would give 100’s of estimates of K. Because GPP and ER are not pooled, they will remain independent and reflect treatments.

Bob

On Apr 25, 2018, at 10:05 AM, Alison Appling notifications@github.com<mailto:notifications@github.com> wrote:

That's interesting that the start times are affecting your results strongly - thanks for sharing that result. I'm curious about the causes of the NAs - you can probably find explanations in the final columns of predict_metab(mm) or get_params(mm) (I assume you're still using MLE observation-error models?).

Peaks being "chopped off" can sometimes be a result of a mismatch between timestamps of the DO and light data. I can vaguely imagine that this phenomenon could also be explained by a bad estimate of K600, or maybe something about the incomplete mixing or very slow gas exchange in the cattle tank, but my thinking on that is pretty hazy right now. Maybe @robohallhttps://github.com/robohall can do better?

About solar.time - for you, because you have your own light data, it's only important because it determines where in the daily DO cycle the date windows sit. It's more important if people are modeling rather than measuring light, because then light is a function of time. As noted above, the temporal match between the DO and light curves is important to get GPP right.

Yes, the last day in your time series is likely not being modeled because that day is not complete. A 24-hour period is required to model each day. One option, if you're OK using some hours of data twice, is to use an earlier time window for the final day such that you can fit a full 24 hours in before your data run out.

Goodness of fit is not a foolproof approach because of equifinality in metabolism models, but you could compute RMSE of the DO predictions like so:

library(dplyr) predict_DO(mm) %>% mutate(err_sq = (DO.mod - DO.obs)^2) %>% group_by(date) %>% summarize(RMSE = sqrt(mean(err_sq)))

But remember that for process-error and state-space models, a close match between observed and predicted oxygen concentrations is not the [only] objective; those models also try to match the [DO] rates of change and have different definitions of DO.mod.

Other ways of assessing model output:

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/USGS-R/streamMetabolizer/issues/367#issuecomment-384341968, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ALgqdqrnbMHUH16fdLCOsbVLhBUw8fgPks5tsJ7LgaJpZM4Td0Xf.

mluerig commented 6 years ago

So, this has been a while - sorry about that.

@robohall I had a look at some literature, and I would guess that K600 is at around 1 cm/h for windspeed=0. So, the tanks are 1 m deep, so my init.K600.daily would be 1 - is something like that what you had in mind? Pooling k600 - at what point would I do that - after estimating for all the ponds, or also before as a model option. I also realized that I am working with constant air pressure (950 hPa), I forgot that I put it in as a placeholder, meant to replace it at some point - does pressure affect the models a lot?

@aappling-usgs I focused on the the starting times, because they had such big effect on the outcomes. I used a range of different starting times for the each week-long chunk: from -1.5 hrs before to 1.5 after the first light measurement, in 0.25 steps. Thus, for each chunk there are 13 models, for which I then took the output from the three models with the best RMS, and averaged them. This gave me good consistency, but I still don't really trust the values: respiration is always much higher than productivity, which is hard to believe given the overall trend of overall increasing oxygen saturation throughout the experiment. I have attached predictions and model estimates for one chunk in mid summer that really should be net productive. Out of 13 possible models, 9 ran without producing just NAs. So, in both plots, each facet shows a different starting time for the same data, and the number above are the hours before or after sunrise (the first measured light).

input data: mod.data.txt full model output: mod.out.txt

stream_metab_do_pred stream_metab_est

robohall commented 6 years ago

Moritz,

Don’t forget to correct for units, 1 cm/h is 0.24 m/d, multiply by depth and get 0.24/d. log (0.24) = -1.4. So I would pick a lognormal prior,( -1.4, 1)

View the prior dist as x<-seq(0.05,2,0.05) plot(x, dlnorm(x,-1.4,1))

You could safely broaden that prior to dlnorm(0, 1) and might be what I would do to guard againstthinking that you have that much resolotion to define and measure gas exchange when it is that low.

As for pooling, you could try to analyze all metabolism from all ponds using one model with K pooled across all.

ER >> GPP is weird unless you are adding external C or the tanks have high ER from productivity that occurred before you measure metabolism. But ER is difficult to measure.

Bob

On Jun 7, 2018, at 9:33 AM, Moritz Lürig notifications@github.com<mailto:notifications@github.com> wrote:

So, this has been a while - sorry about that.

@robohallhttps://github.com/robohall I had a look at some literature, and I would guess that K600 is at around 1 cm/h for windspeed=0. So, the tanks are 1 m deep, so my init.K600.daily would be 1 - is something like that what you had in mind? Pooling k600 - at what point would I do that - after estimating for all the ponds, or also before as a model option. I also realized that I am working with constant air pressure (950 hPa), I forgot that I put it in as a placeholder, meant to replace it at some point - does pressure affect the models a lot?

@aappling-usgshttps://github.com/aappling-usgs I focused on the the starting times, because they had such big effect on the outcomes. I used a range of different starting times for the each week-long chunk: from -1.5 hrs before to 1.5 after the first light measurement, in 0.25 steps. Thus, for each chunk there are 13 models, for which I then took the output from the three models with the best RMS, and averaged them. This gave me good consistency, but I still don't really trust the values: respiration is always much higher than productivity, which is hard to believe given the overall trend of overall increasing oxygen saturation throughout the experiment. I have attached predictions and model estimates for one chunk in mid summer that really should be net productive. Out of 13 possible models, 9 ran without producing just NAs. So, in both plots, each facet shows a different starting time for the same data, and the number above are the hours before or after sunrise (the first measured light).

input data: mod.data.txthttps://github.com/USGS-R/streamMetabolizer/files/2080881/mod.data.txt full model output: mod.out.txthttps://github.com/USGS-R/streamMetabolizer/files/2080885/mod.out.txt

[stream_metab_do_pred]https://user-images.githubusercontent.com/15648068/41106646-bb26f86a-6a70-11e8-8f41-3c8017a0cfab.png [stream_metab_est]https://user-images.githubusercontent.com/15648068/41106645-bb0be746-6a70-11e8-875d-4c7393255d88.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/USGS-R/streamMetabolizer/issues/367#issuecomment-395465046, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ALgqdn40CUAtgYtZ66fpl9nLD1Bwn68Lks5t6UfWgaJpZM4Td0Xf.

mluerig commented 6 years ago

@robohall, Hi Bob,

thanks for your suggestions. I gave bayesian models a try and found that they seem to work better with my data (see model specs below). P:R ratio is now flickering around 1 throughout the experiment, which seems more meaningful than what I had before. The problem that remains are the "chopped off peaks" (see plot below, same data as above/before). As @aappling-usgs suggested this may be due to bad estimates of K600. I used pool_K600='normal', but that's not what you had in mind with "analyze all metabolism from all ponds using one model with K pooled across all", is it? How would I do that?

I am also still struggling with specifying a lognormal prior, which, as I understand, for bayes-models works with K600_daily_meanlog_meanlog rather than init.K600.daily, right? So, how would I by your suggestion specifiy a prior of dlnorm(0, 1) for a bayes model?

Thanks very much again!

plot_do_preds

mod_name <- mm_name(type='bayes', pool_K600='normal') 
mod_specs <- specs(mod_name, day_start=t, day_end=24+t)
mod_fit1 <- metab(mod_specs, data=mod.data)
mod_fit1 

metab_model of type metab_bayes 
streamMetabolizer version 0.10.9 
Specifications:
  model_name                 b_Kn_oipi_tr_plrckm.stan                                                                                                                         
  engine                     stan                                                                                                                                             
  split_dates                FALSE                                                                                                                                            
  keep_mcmcs                 TRUE                                                                                                                                             
  keep_mcmc_data             TRUE                                                                                                                                             
  day_start                  7                                                                                                                                                
  day_end                    31                                                                                                                                               
  day_tests                  full_day, even_timesteps, complete_data, pos_discharge                                                                                           
  required_timestep          NA                                                                                                                                               
  GPP_daily_mu               3.1                                                                                                                                              
  GPP_daily_lower            -Inf                                                                                                                                             
  GPP_daily_sigma            6                                                                                                                                                
  ER_daily_mu                -7.1                                                                                                                                             
  ER_daily_upper             Inf                                                                                                                                              
  ER_daily_sigma             7.1                                                                                                                                              
  K600_daily_meanlog_meanlog 2.484906649788                                                                                                                                   
  K600_daily_meanlog_sdlog   1.32                                                                                                                                             
  K600_daily_sdlog_sigma     0.05                                                                                                                                             
  err_obs_iid_sigma_scale    0.03                                                                                                                                             
  err_proc_iid_sigma_scale   5                                                                                                                                                
  params_in                  GPP_daily_mu, GPP_daily_lower, GPP_daily_sigma, ER_daily_mu, ER_daily_upper, ER_daily_sigma, K600_daily_meanlog_meanlog, K600_daily_meanlog_sd...
  params_out                 GPP, ER, GPP_daily, ER_daily, K600_daily, K600_daily_predlog, K600_daily_sdlog, err_obs_iid_sigma, err_obs_iid, err_proc_iid_sigma, err_proc_iid 
  n_chains                   4                                                                                                                                                
  n_cores                    4                                                                                                                                                
  burnin_steps               500                                                                                                                                              
  saved_steps                500                                                                                                                                              
  thin_steps                 1                                                                                                                                                
  verbose                    FALSE                                                                                                                                            
  model_path                 E:/R/01_my_library/streamMetabolizer/models/b_Kn_oipi_tr_plrckm.stan                                                                             
Fitting time: 88.46 secs elapsed
Parameters (6 dates):
        date  GPP.daily GPP.daily.lower GPP.daily.upper    ER.daily ER.daily.lower ER.daily.upper  K600.daily K600.daily.lower K600.daily.upper msgs.fit
1 2015-09-26 0.1448576        0.1112568       0.1770925 -0.1057234      -0.1690940    -0.02520984 0.06866319        0.03919514        0.1446102  w      
2 2015-09-27 0.2062770        0.1599451       0.2515621 -0.2950681      -0.3624439    -0.22003165 0.06875709        0.04028494        0.1438024  w      
3 2015-09-28 0.3768529        0.3368254       0.4129556 -0.2729357      -0.3335530    -0.20351164 0.06869980        0.04081866        0.1428849  w      
4 2015-09-29 0.2438324        0.2003844       0.2792674 -0.1572114      -0.2468089    -0.08244864 0.06838458        0.04089358        0.1479235  w      
5 2015-09-30 0.2380278        0.2009854       0.2891546 -0.1878116      -0.2560476    -0.12553222 0.06856512        0.04044155        0.1455956  w      
6 2015-10-01        NA               NA              NA         NA              NA             NA         NA                NA               NA  w     E
Fitting warnings:
  There were 8 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
  There were 8 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See
http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded
  There were 4 chains where the estimated Bayesian Fraction of Missing Information was low. See
http://mc-stan.org/misc/warnings.html#bfmi-low
  Examine the pairs() plot to diagnose sampling problems
  6 dates: overall warnings
Fitting errors:
  1 date: data don't end when expected
  1 date: NAs in light
Predictions (6 dates):
# A tibble: 6 x 9
  date          GPP GPP.lower GPP.upper      ER ER.lower ER.upper msgs.fit  msgs.pred
  <date>      <dbl>     <dbl>     <dbl>   <dbl>    <dbl>    <dbl> <chr>     <chr>    
1 2015-09-26  0.145     0.111     0.177  -0.106   -0.169  -0.0252 "w      " "       "
2 2015-09-27  0.206     0.160     0.252  -0.295   -0.362  -0.220  "w      " "       "
3 2015-09-28  0.377     0.337     0.413  -0.273   -0.334  -0.204  "w      " "       "
4 2015-09-29  0.244     0.200     0.279  -0.157   -0.247  -0.0824 "w      " "       "
5 2015-09-30  0.238     0.201     0.289  -0.188   -0.256  -0.126  "w      " "       "
6 2015-10-01 NA        NA        NA      NA       NA      NA      w     E   "       "
robohall commented 6 years ago

Moritz,

Yes Kn is the pooling option I would use.

I now see that it may not be easy to do all tanks at once because although sM does not include time in models (today does not depend on yesterday), it does use time to decide what a ‘day’ is. You could be sneaky and give each tank its proper day-month, but make up any year to distinguish tanks, i.e. tank 6 could be 2026. That way you pool across all tanks, which should be fine. I would do this, especially if individual tanks give very different values of K.

for the prior set

K600_daily_meanlog_meanlog = 0

K600_daily_meanlog_sdlog =1

Don’t worry about inits unless the MCMC chains get lost.

I see based on the blue graph below, you have lots of process error. There may be processes going on in the tanks—probably poorly mixed—than the model is considering. Check to make sure the process error you get does not exceed the prior. It should be fine since I think we used a half cauchy prior dist. for that.

Double and triple check your light data to make sure its time is correct and that this time matches that from the DO sensors. A typical problem is when folks have light in one time zone and oxygen in another due to accounting (or not) for summer time changes.

Bob

On Jul 5, 2018, at 12:19 PM, Moritz Lürig notifications@github.com<mailto:notifications@github.com> wrote:

@robohallhttps://github.com/robohall, Hi Bob,

thanks for your suggestions. I gave bayesian models a try and found that they seem to work better with my data (see model specs below). P:R ratio is now flickering around 1 throughout the experiment, which seems more meaningful than what I had before. The problem that remains are the "chopped off peaks" (see plot below, same data as above/before). As @aappling-usgshttps://github.com/aappling-usgs suggested this may be due to bad estimates of K600. I used pool_K600='normal', but that's not what you had in mind with "analyze all metabolism from all ponds using one model with K pooled across all", is it? How would I do that?

I am also still struggling with specifying a lognormal prior, which, as I understand, for bayes-models works with K600_daily_meanlog_meanlog rather than init.K600.daily, right? So, how would I by your suggestion specifiy a prior of dlnorm(0, 1) for a bayes model?

Thanks very much again!

[plot_do_preds]https://user-images.githubusercontent.com/15648068/42340445-4f695114-8090-11e8-971c-e96b91e015cf.png

mod_name <- mm_name(type='bayes', pool_K600='normal') mod_specs <- specs(mod_name, day_start=t, day_end=24+t) mod_fit1 <- metab(mod_specs, data=mod.data) mod_fit1

metab_model of type metab_bayes streamMetabolizer version 0.10.9 Specifications: model_name b_Kn_oipi_tr_plrckm.stan engine stan split_dates FALSE keep_mcmcs TRUE keep_mcmc_data TRUE day_start 7 day_end 31 day_tests full_day, even_timesteps, complete_data, pos_discharge required_timestep NA GPP_daily_mu 3.1 GPP_daily_lower -Inf GPP_daily_sigma 6 ER_daily_mu -7.1 ER_daily_upper Inf ER_daily_sigma 7.1 K600_daily_meanlog_meanlog 2.484906649788 K600_daily_meanlog_sdlog 1.32 K600_daily_sdlog_sigma 0.05 err_obs_iid_sigma_scale 0.03 err_proc_iid_sigma_scale 5 params_in GPP_daily_mu, GPP_daily_lower, GPP_daily_sigma, ER_daily_mu, ER_daily_upper, ER_daily_sigma, K600_daily_meanlog_meanlog, K600_daily_meanlog_sd... params_out GPP, ER, GPP_daily, ER_daily, K600_daily, K600_daily_predlog, K600_daily_sdlog, err_obs_iid_sigma, err_obs_iid, err_proc_iid_sigma, err_proc_iid n_chains 4 n_cores 4 burnin_steps 500 saved_steps 500 thin_steps 1 verbose FALSE model_path E:/R/01_my_library/streamMetabolizer/models/b_Kn_oipi_tr_plrckm.stan Fitting time: 88.46 secs elapsed Parameters (6 dates): date GPP.daily GPP.daily.lower GPP.daily.upper ER.daily ER.daily.lower ER.daily.upper K600.daily K600.daily.lower K600.daily.upper msgs.fit 1 2015-09-26 0.1448576 0.1112568 0.1770925 -0.1057234 -0.1690940 -0.02520984 0.06866319 0.03919514 0.1446102 w 2 2015-09-27 0.2062770 0.1599451 0.2515621 -0.2950681 -0.3624439 -0.22003165 0.06875709 0.04028494 0.1438024 w 3 2015-09-28 0.3768529 0.3368254 0.4129556 -0.2729357 -0.3335530 -0.20351164 0.06869980 0.04081866 0.1428849 w 4 2015-09-29 0.2438324 0.2003844 0.2792674 -0.1572114 -0.2468089 -0.08244864 0.06838458 0.04089358 0.1479235 w 5 2015-09-30 0.2380278 0.2009854 0.2891546 -0.1878116 -0.2560476 -0.12553222 0.06856512 0.04044155 0.1455956 w 6 2015-10-01 NA NA NA NA NA NA NA NA NA w E Fitting warnings: There were 8 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup There were 8 transitions after warmup that exceeded the maximum treedepth. Increase max_treedepth above 10. See http://mc-stan.org/misc/warnings.html#maximum-treedepth-exceeded There were 4 chains where the estimated Bayesian Fraction of Missing Information was low. See http://mc-stan.org/misc/warnings.html#bfmi-low Examine the pairs() plot to diagnose sampling problems 6 dates: overall warnings Fitting errors: 1 date: data don't end when expected 1 date: NAs in light Predictions (6 dates):

A tibble: 6 x 9

date GPP GPP.lower GPP.upper ER ER.lower ER.upper msgs.fit msgs.pred

1 2015-09-26 0.145 0.111 0.177 -0.106 -0.169 -0.0252 "w " " " 2 2015-09-27 0.206 0.160 0.252 -0.295 -0.362 -0.220 "w " " " 3 2015-09-28 0.377 0.337 0.413 -0.273 -0.334 -0.204 "w " " " 4 2015-09-29 0.244 0.200 0.279 -0.157 -0.247 -0.0824 "w " " " 5 2015-09-30 0.238 0.201 0.289 -0.188 -0.256 -0.126 "w " " " 6 2015-10-01 NA NA NA NA NA NA w E " " — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
mluerig commented 6 years ago

Hi Bob,

Pooling: I tried the pooling approach by pond=year (nice idea btw), but it didn't really affect my results for GPP and ER; K600 was always between 0.5 and 1). I think that is reasonable comparing the numbers to literature. I was wondering whether it would make sense to fix K600 completely. Some colleagues from the aquatic physics department here at Eawag have found that there is virtually no mixing in artificial ponds. The ones they measured was much bigger (15000L), but still remained completely stratified. Is it possible to fix K600 or make it linearly dependent on temp? Or can I constrain the model somehow to a range of small values? Either way, I think any model returning very small K600 like should well reflect the (poor) mixing dynamics in these tanks.

Process error: I found that this depends i) on the start of my day, ii) how much I smooth my data.

If I wanted to quantify the process error (how well does the blue curve fit the dots), how might I do that?

26 12

mluerig commented 6 years ago

I'm gonna close this. Thanks SO MUCH for your feedback, this was very helpful and also fun, getting the models to work. I'll share the paper when it's out. Given that more and more people do experiments in mesocosms (and I do think here metabolism is somewhat under-researched) it might be interesting and useful to have a setting to tweak the models towards experimental ponds, with little or no mixing.