DOI-USGS / streamMetabolizer

streamMetabolizer uses inverse modeling to estimate aquatic metabolism (photosynthesis and respiration) from time series data on dissolved oxygen, water temperature, depth, and light.
http://usgs-r.github.io/streamMetabolizer/
Other
37 stars 22 forks source link

Model estimation of model dissolved oxygen cuts off between days + dO/dt units check #393

Closed amina-uwa closed 4 years ago

amina-uwa commented 4 years ago

Dear Alison,

I have been trying to implement the streamMetabolizer code at different sites, and considering changes in the priors used for the model to be site specific. I have been running in to the problem of the model chopping off a part of the predictions for DO, DO sat, and do/dt at a specific time each day. And I am still unsure why is this happening. The one different thing I did was converting my input data to hourly after I converted the to solar time from local time because of my weather data ( light). This is the only difference I have than the original codes because the light is being measured on an hourly basis whereas my data are every 10 mins. I have looked at older posts #367 which had a similar question and tried changing the start and end date but got the following error. I ran the default 4 and 28 day_start and end and got the initial plot I attached which has the chopped off bit. But when I changed the start day I got the error below. I am not sure where the error is coming from. I am also attaching a plot for all the input hourly data.

My second question is regarding the dO/dt. I am a bit confused of having 24 readings blue dots for observations with the mg/L/day unit in the third plot of summer_plot.png attached. My understanding is that the Bayesian model generated daily data which is then transformed with an ODE to get the hourly predictions? Any idea if I understood it wrong?

What you saw on your computer

Warning message: In system.time({ : multi-day models should probably have day_end - day_start <= 24 hours

Summer_plot hourly data2 Hourly data1

data-summer.txt model_input_oupot.txt

Include all code you ran (a minimal example) and all console output, errors, and warnings. Include a data file if needed. mm <- metab_bayes(data=dat,specs(mm_name('bayes', err_proc_iid=FALSE),day_start =-1.5, day_end = 28, n_cores=4, n_chains=5, burnin_steps=500, saved_steps=500))

Session information

R version 4.0.0 (2020-04-24) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 17763)

Matrix products: default

locale: [1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252
[3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C
[5] LC_TIME=English_Australia.1252

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] wql_0.4.9 fpp_0.5 tseries_0.10-47
[4] lmtest_0.9-37 expsmooth_2.3 fma_2.4
[7] forecast_8.12 LakeMetabolizer_1.5.0 rLakeAnalyzer_1.11.4.1
[10] StreamPULSE_0.0.0.9043 Cairo_1.5-12 shiny_1.4.0.2
[13] forcats_0.5.0 stringr_1.4.0 purrr_0.3.4
[16] readr_1.3.1 tibble_3.0.0 tidyverse_1.3.0
[19] scales_1.1.0 cowplot_1.0.0 reshape_0.8.8
[22] lubridate_1.7.8 xts_0.12-0 zoo_1.8-7
[25] tidyr_1.0.2 imputeTS_3.0 devtools_2.3.0
[28] usethis_1.6.0 rstan_2.19.3 StanHeaders_2.19.2
[31] ggplot2_3.3.0 dplyr_0.8.5 streamMetabolizer_0.10.9

loaded via a namespace (and not attached): [1] colorspace_1.4-1 ellipsis_0.3.0 rprojroot_1.3-2 fs_1.4.1
[5] rstudioapi_0.11 farver_2.0.3 remotes_2.1.1 fansi_0.4.1
[9] xml2_1.3.1 codetools_0.2-16 splines_4.0.0 pkgload_1.0.2
[13] jsonlite_1.6.1 broom_0.5.6 dbplyr_1.4.3 compiler_4.0.0
[17] httr_1.4.1 backports_1.1.6 assertthat_0.2.1 Matrix_1.2-18
[21] fastmap_1.0.1 lazyeval_0.2.2 cli_2.0.2 later_1.0.0
[25] htmltools_0.4.0 prettyunits_1.1.1 tools_4.0.0 gtable_0.3.0
[29] glue_1.4.0 reshape2_1.4.4 Rcpp_1.0.4.6 cellranger_1.1.0
[33] fracdiff_1.5-1 vctrs_0.2.4 urca_1.3-0 nlme_3.1-147
[37] timeDate_3043.102 ps_1.3.2 testthat_2.3.2 rvest_0.3.5
[41] mime_0.9 lifecycle_0.2.0 unitted_0.2.9 hms_0.5.3
[45] promises_1.1.0 parallel_4.0.0 inline_0.3.15 yaml_2.2.1
[49] quantmod_0.4.17 curl_4.3 memoise_1.1.0 gridExtra_2.3
[53] loo_2.2.0 stringi_1.4.6 dygraphs_1.1.1.6 desc_1.2.0
[57] TTR_0.23-6 pkgbuild_1.0.7 rlang_0.4.5 pkgconfig_2.0.3
[61] matrixStats_0.56.0 lattice_0.20-41 stinepack_1.4 htmlwidgets_1.5.1 [65] labeling_0.3 processx_3.4.2 tidyselect_1.0.0 deSolve_1.28
[69] plyr_1.8.6 magrittr_1.5 R6_2.4.1 generics_0.0.2
[73] DBI_1.1.0 pillar_1.4.3 haven_2.2.0 withr_2.2.0
[77] mgcv_1.8-31 nnet_7.3-13 modelr_0.1.6 crayon_1.3.4
[81] utf8_1.1.4 plotly_4.9.2.1 grid_4.0.0 readxl_1.3.1
[85] data.table_1.12.8 callr_3.4.3 reprex_0.3.0 digest_0.6.25
[89] xtable_1.8-4 httpuv_1.5.2 stats4_4.0.0 munsell_0.5.0
[93] viridisLite_0.3.0 sessioninfo_1.1.1 quadprog_1.5-8

> devtools::session_info()
aappling-usgs commented 4 years ago

For the Bayesian models, the day_start and day_end times need to be <= 24 hours apart and really should be exactly 24 hours apart, which is why you're getting the warning "day_end - day_start <= 24 hours".

I think the "chopping off" you're seeing is simply the way ggplot2 works: if a line is only connected for the hours 4am through the following 3am, then it won't extend beyond those hours. Thus, two adjacent days that run 4am to 3am will have a 1-hour gap between them even though there are hourly predictions available for every hour. This is probably more apparent to you when you're using temporal resolution of 1 hour relative to something finer, because a 15-minute or 5-minute gap is just harder to spot.

My second question is regarding the dO/dt. I am a bit confused of having 24 readings blue dots for observations with the mg/L/day unit in the third plot of summer_plot.png attached. My understanding is that the Bayesian model generated daily data which is then transformed with an ODE to get the hourly predictions? Any idea if I understood it wrong?

I'm not sure I understand your concern here, but maybe it would help to affirm that yes, the main product of the Bayesian model is daily estimates of mean daily GPP, ER, and K600. The predict_DO() and plot_DO_preds() functions generate sub-daily values from those daily estimates simply as a way to see what those daily estimates imply about sub-daily patterns. Also note (and see https://github.com/USGS-R/streamMetabolizer/issues/294) that the predictions from those functions are not the estimated model state (they omit process-error corrections) for Bayesian models that include process error.

aappling-usgs commented 4 years ago

I don't see anything needing a fix within streamMetabolizer based on your comments, so I'm going to close this issue, but feel free to add more comments if I wasn't able to clear things up.

amina-uwa commented 4 years ago

Thank you Alison for your reply. I have generated 10 mins interval of my light data and ended up using the 10 mins data I already had for my model input. However, I still have a large cut off in DO model between 3:52 am and 4:02 am (plot1). I am not sure if I understood what you meant by ggplot having this problem. When I make the plot by myself by accessing the model data, I get a sharp line as well ( plot_2).

For question 2, I find it a bit confusing, could you kindly explain how is dO/dt is being estimated? Looking at the plot_DO_preds I could see that there is a lag function used to calculate dO/dt for each time step. Could you explain where the Do.obs (blue dots ) vs DO.mod model (blue line) values come from for the package plot ( in my case its red dots and black line) ? I am attaching the plots for this run.

DO

DO_plot_2

DO_test.txt

aappling-usgs commented 4 years ago

I have generated 10 mins interval of my light data and ended up using the 10 mins data I already had for my model input. However, I still have a large cut off in DO model between 3:52 am and 4:02 am (plot1). I am not sure if I understood what you meant by ggplot having this problem. When I make the plot by myself by accessing the model data, I get a sharp line as well ( plot_2).

OK, I think I misunderstood your question - I thought you were talking about a 60-minute gap between 3:02 am and 4:02 am, but now I think you're talking about a ~1 mg/L gap between 5.2 and 6.2 mg/L at whichever times end one day and begin the next. Do I understand you correctly now? If so, then what you're seeing reflects the different daily values of GPP and ER that have been estimated, which when further combined with different starting DO values simply don't guarantee a matchup in DO predictions between one day and the next. I agree that the models would be still more believable if these values matched up, but these mismatches are quite common across streamMetabolizer and other metabolism models, weren't even addressed in earlier generations of metabolism estimations based on just ~24 hours of monitoring data, and would require some substantial new innovation to avoid.

aappling-usgs commented 4 years ago

My second question is regarding the dO/dt. I am a bit confused of having 24 readings blue dots for observations with the mg/L/day unit in the third plot of summer_plot.png attached. My understanding is that the Bayesian model generated daily data which is then transformed with an ODE to get the hourly predictions? Any idea if I understood it wrong?

I'm not sure I understand your concern here, but maybe it would help to affirm that yes, the main product of the Bayesian model is daily estimates of mean daily GPP, ER, and K600. The predict_DO() and plot_DO_preds() functions generate sub-daily values from those daily estimates simply as a way to see what those daily estimates imply about sub-daily patterns. Also note (and see #294) that the predictions from those functions are not the estimated model state (they omit process-error corrections) for Bayesian models that include process error.

Looking at the plot_DO_preds I could see that there is a lag function used to calculate dO/dt for each time step. Could you explain where the Do.obs (blue dots ) vs DO.mod model (blue line) values come from for the package plot ( in my case its red dots and black line) ? I am attaching the plots for this run.

For the purpose of building DO.mod, DO/dt is calculated as a function of GPP, ER, and K600 - probably more like what you were expecting. But for the purpose of plotting DO/dt, we calculate DO/dt from both DO.mod and DO.obs as diff(DO.xxx)/as.numeric(diff(solar.time), units="days")) (https://github.com/USGS-R/streamMetabolizer/blob/master/R/plot_DO_preds.R#L68).