PecanProject / pecan

The Predictive Ecosystem Analyzer (PEcAn) is an integrated ecological bioinformatics toolbox.
www.pecanproject.org
Other
202 stars 234 forks source link

<bety><write> tag not respected by `get.trait.data()` #2968

Closed Aariq closed 1 year ago

Aariq commented 2 years ago

Bug Description

get.trait.data() seems to write to BETY (not in documentation), but it doesn't seem to check for settings$database$bety$write.

To Reproduce

run PEcAn workflow with settings$database$bety$write <- FALSE and check to see if file.path(settings$database$dbfiles, "posterior", settings$pfts$pft$posteriorid) exists

Expected behavior

Nothing should be written to settings$database$dbfiles if settings$database$bety$write == FALSE

Aariq commented 2 years ago

Just to clarify, this means that every time get.trait.data() is run it writes new posteriors to BETY. Here's an example of running runModule.get.trait.data() twice in a row with setting$database$bety$write <- FALSE:

``` r > # Query trait database ---------------------------------------------------- > settings <- runModule.get.trait.data(settings) 2022-07-22 13:29:02 DEBUG [PEcAn.DB::get.trait.data] : `trait.names` is NULL, so retrieving all traits that have at least one prior for these PFTs. 2022-07-22 13:29:03 ERROR [PEcAn.utils::transformstats] : data contains untransformed statistics 2022-07-22 13:29:04 INFO [query.trait.data] : --------------------------------------------------------- 2022-07-22 13:29:04 INFO [query.trait.data] : stomatal_slope 2022-07-22 13:29:04 INFO [query.trait.data] : Median stomatal_slope : 3.79 2022-07-22 13:29:04 INFO [query.trait.data] : --------------------------------------------------------- 2022-07-22 13:29:04 INFO [query.trait.data] : --------------------------------------------------------- 2022-07-22 13:29:04 INFO [query.trait.data] : SLA 2022-07-22 13:29:04 ERROR [PEcAn.utils::transformstats] : data contains untransformed statistics 2022-07-22 13:29:04 INFO [query.trait.data] : Median SLA : 43 2022-07-22 13:29:04 INFO [query.trait.data] : --------------------------------------------------------- 2022-07-22 13:29:04 INFO [query.trait.data] : --------------------------------------------------------- 2022-07-22 13:29:04 INFO [query.trait.data] : Vcmax 2022-07-22 13:29:04 INFO [query.trait.data] : Median Vcmax : 24.367 2022-07-22 13:29:04 INFO [query.trait.data] : --------------------------------------------------------- 2022-07-22 13:29:04 INFO [query.trait.data] : --------------------------------------------------------- 2022-07-22 13:29:04 INFO [query.trait.data] : cuticular_cond 2022-07-22 13:29:04 INFO [query.trait.data] : Median cuticular_cond : 30546 2022-07-22 13:29:04 INFO [query.trait.data] : --------------------------------------------------------- 2022-07-22 13:29:04 INFO [query.trait.data] : --------------------------------------------------------- 2022-07-22 13:29:04 INFO [query.trait.data] : quantum_efficiency 2022-07-22 13:29:04 INFO [query.trait.data] : Median quantum_efficiency : 0.062 2022-07-22 13:29:04 INFO [query.trait.data] : --------------------------------------------------------- 2022-07-22 13:29:04 INFO [FUN] : Number of observations per trait for PFT 'SetariaWT' : trait n 1 cuticular_cond 33 2 quantum_efficiency 27 3 SLA 52 4 stomatal_slope 33 5 Vcmax 33 2022-07-22 13:29:04 INFO [FUN] : Summary of prior distributions for PFT 'SetariaWT' : distn parama paramb n mort2 gamma 1.470 0.0578 0 growth_resp_factor beta 2.630 6.5200 0 leaf_turnover_rate gamma 2.900 0.6300 40 leaf_width gamma 6.530 1.4900 17 nonlocal_dispersal beta 20.300 76.1000 30 fineroot2leaf lnorm 0.811 0.8430 0 root_turnover_rate weibull 1.670 0.6570 66 seedling_mortality beta 3.610 0.4330 0 stomatal_slope weibull 3.630 3.8100 4 quantum_efficiency norm 0.057 0.0060 56 Vcmax lnorm 3.750 0.3000 12 r_fract beta 2.000 4.0000 0 cuticular_cond lnorm 8.400 0.9000 0 root_respiration_rate weibull 2.660 6.2900 35 Vm_low_temp norm 10.000 1.0200 0 SLA weibull 5.000 50.0000 0 2022-07-22 13:29:04 DEBUG [FUN] : The following posterior files found in PFT outdir ( '/data/tests/ed2_testout/pft/SetariaWT' ) will be registered in BETY under posterior ID 9000001246 : 'prior.distns.csv', 'prior.distns.Rdata', 'species.csv', 'trait.data.csv', 'trait.data.Rdata' . The following files (if any) will not be registered because they already existed: 2022-07-22 13:29:05 ERROR [PEcAn.utils::transformstats] : data contains untransformed statistics 2022-07-22 13:29:05 ERROR [PEcAn.utils::transformstats] : data contains untransformed statistics 2022-07-22 13:29:05 ERROR [PEcAn.utils::transformstats] : data contains untransformed statistics 2022-07-22 13:29:05 ERROR [PEcAn.utils::transformstats] : data contains untransformed statistics 2022-07-22 13:29:05 ERROR [PEcAn.utils::transformstats] : data contains untransformed statistics 2022-07-22 13:29:06 INFO [query.trait.data] : --------------------------------------------------------- 2022-07-22 13:29:06 INFO [query.trait.data] : c2n_leaf 2022-07-22 13:29:06 ERROR [PEcAn.utils::transformstats] : data contains untransformed statistics 2022-07-22 13:29:06 ERROR [PEcAn.utils::transformstats] : data contains untransformed statistics 2022-07-22 13:29:06 INFO [query.trait.data] : Median c2n_leaf : 32.877 2022-07-22 13:29:06 INFO [query.trait.data] : --------------------------------------------------------- 2022-07-22 13:29:06 INFO [query.trait.data] : --------------------------------------------------------- 2022-07-22 13:29:06 INFO [query.trait.data] : SLA 2022-07-22 13:29:06 ERROR [PEcAn.utils::transformstats] : data contains untransformed statistics 2022-07-22 13:29:06 INFO [query.trait.data] : Median SLA : 15.713 2022-07-22 13:29:06 INFO [query.trait.data] : --------------------------------------------------------- 2022-07-22 13:29:06 INFO [query.trait.data] : --------------------------------------------------------- 2022-07-22 13:29:06 INFO [query.trait.data] : leaf_respiration_rate_m2 2022-07-22 13:29:06 INFO [query.trait.data] : Median leaf_respiration_rate_m2 : 1.015 2022-07-22 13:29:06 INFO [query.trait.data] : --------------------------------------------------------- 2022-07-22 13:29:06 INFO [query.trait.data] : --------------------------------------------------------- 2022-07-22 13:29:06 INFO [query.trait.data] : Vcmax 2022-07-22 13:29:06 ERROR [PEcAn.utils::transformstats] : data contains untransformed statistics 2022-07-22 13:29:06 INFO [query.trait.data] : Median Vcmax : 43.212 2022-07-22 13:29:06 INFO [query.trait.data] : --------------------------------------------------------- 2022-07-22 13:29:06 INFO [query.trait.data] : --------------------------------------------------------- 2022-07-22 13:29:06 INFO [query.trait.data] : quantum_efficiency 2022-07-22 13:29:06 ERROR [PEcAn.utils::transformstats] : data contains untransformed statistics 2022-07-22 13:29:06 INFO [query.trait.data] : Median quantum_efficiency : 0.052 2022-07-22 13:29:06 INFO [query.trait.data] : --------------------------------------------------------- 2022-07-22 13:29:06 INFO [FUN] : Number of observations per trait for PFT 'ebifarm.c3grass' : trait n 1 c2n_leaf 57 2 leaf_respiration_rate_m2 2 3 quantum_efficiency 6 4 SLA 19 5 Vcmax 3 2022-07-22 13:29:06 INFO [FUN] : Summary of prior distributions for PFT 'ebifarm.c3grass' : distn parama paramb n mort2 gamma 1.470 0.0578 0 growth_resp_factor beta 2.630 6.5200 0 fineroot2leaf lnorm 0.811 0.8430 0 root_turnover_rate weibull 1.670 0.6570 66 seedling_mortality beta 3.610 0.4330 0 Vcmax lnorm 4.510 0.6400 19 stomatal_slope lnorm 2.590 0.2600 11 r_fract beta 2.000 4.0000 0 c2n_leaf gamma 4.180 0.1300 95 root_respiration_rate weibull 2.660 6.2900 35 SLA weibull 2.060 19.0000 125 water_conductance lnorm -5.400 3.0000 0 quantum_efficiency weibull 3.320 0.0800 0 leaf_respiration_rate_m2 lnorm 0.632 0.6500 32 2022-07-22 13:29:06 DEBUG [FUN] : The following posterior files found in PFT outdir ( '/data/tests/ed2_testout/pft/ebifarm.c3grass' ) will be registered in BETY under posterior ID 9000001247 : 'prior.distns.csv', 'prior.distns.Rdata', 'species.csv', 'trait.data.csv', 'trait.data.Rdata' . The following files (if any) will not be registered because they already existed: > # Query trait database ---------------------------------------------------- > settings <- runModule.get.trait.data(settings) 2022-07-22 13:32:31 DEBUG [PEcAn.DB::get.trait.data] : `trait.names` is NULL, so retrieving all traits that have at least one prior for these PFTs. 2022-07-22 13:32:32 ERROR [PEcAn.utils::transformstats] : data contains untransformed statistics 2022-07-22 13:32:33 INFO [query.trait.data] : --------------------------------------------------------- 2022-07-22 13:32:33 INFO [query.trait.data] : stomatal_slope 2022-07-22 13:32:33 INFO [query.trait.data] : Median stomatal_slope : 3.79 2022-07-22 13:32:33 INFO [query.trait.data] : --------------------------------------------------------- 2022-07-22 13:32:33 INFO [query.trait.data] : --------------------------------------------------------- 2022-07-22 13:32:33 INFO [query.trait.data] : SLA 2022-07-22 13:32:33 ERROR [PEcAn.utils::transformstats] : data contains untransformed statistics 2022-07-22 13:32:33 INFO [query.trait.data] : Median SLA : 43 2022-07-22 13:32:33 INFO [query.trait.data] : --------------------------------------------------------- 2022-07-22 13:32:33 INFO [query.trait.data] : --------------------------------------------------------- 2022-07-22 13:32:33 INFO [query.trait.data] : Vcmax 2022-07-22 13:32:33 INFO [query.trait.data] : Median Vcmax : 24.367 2022-07-22 13:32:33 INFO [query.trait.data] : --------------------------------------------------------- 2022-07-22 13:32:33 INFO [query.trait.data] : --------------------------------------------------------- 2022-07-22 13:32:33 INFO [query.trait.data] : cuticular_cond 2022-07-22 13:32:33 INFO [query.trait.data] : Median cuticular_cond : 30546 2022-07-22 13:32:33 INFO [query.trait.data] : --------------------------------------------------------- 2022-07-22 13:32:33 INFO [query.trait.data] : --------------------------------------------------------- 2022-07-22 13:32:33 INFO [query.trait.data] : quantum_efficiency 2022-07-22 13:32:33 INFO [query.trait.data] : Median quantum_efficiency : 0.062 2022-07-22 13:32:33 INFO [query.trait.data] : --------------------------------------------------------- 2022-07-22 13:32:33 INFO [FUN] : Number of observations per trait for PFT 'SetariaWT' : trait n 1 cuticular_cond 33 2 quantum_efficiency 27 3 SLA 52 4 stomatal_slope 33 5 Vcmax 33 2022-07-22 13:32:33 INFO [FUN] : Summary of prior distributions for PFT 'SetariaWT' : distn parama paramb n mort2 gamma 1.470 0.0578 0 growth_resp_factor beta 2.630 6.5200 0 leaf_turnover_rate gamma 2.900 0.6300 40 leaf_width gamma 6.530 1.4900 17 nonlocal_dispersal beta 20.300 76.1000 30 fineroot2leaf lnorm 0.811 0.8430 0 root_turnover_rate weibull 1.670 0.6570 66 seedling_mortality beta 3.610 0.4330 0 stomatal_slope weibull 3.630 3.8100 4 quantum_efficiency norm 0.057 0.0060 56 Vcmax lnorm 3.750 0.3000 12 r_fract beta 2.000 4.0000 0 cuticular_cond lnorm 8.400 0.9000 0 root_respiration_rate weibull 2.660 6.2900 35 Vm_low_temp norm 10.000 1.0200 0 SLA weibull 5.000 50.0000 0 2022-07-22 13:32:33 DEBUG [FUN] : The following posterior files found in PFT outdir ( '/data/tests/ed2_testout/pft/SetariaWT' ) will be registered in BETY under posterior ID 9000001248 : 'prior.distns.csv', 'prior.distns.Rdata', 'species.csv', 'trait.data.csv', 'trait.data.Rdata' . The following files (if any) will not be registered because they already existed: 2022-07-22 13:32:35 ERROR [PEcAn.utils::transformstats] : data contains untransformed statistics 2022-07-22 13:32:35 ERROR [PEcAn.utils::transformstats] : data contains untransformed statistics 2022-07-22 13:32:35 ERROR [PEcAn.utils::transformstats] : data contains untransformed statistics 2022-07-22 13:32:35 ERROR [PEcAn.utils::transformstats] : data contains untransformed statistics 2022-07-22 13:32:35 ERROR [PEcAn.utils::transformstats] : data contains untransformed statistics 2022-07-22 13:32:35 INFO [query.trait.data] : --------------------------------------------------------- 2022-07-22 13:32:35 INFO [query.trait.data] : c2n_leaf 2022-07-22 13:32:35 ERROR [PEcAn.utils::transformstats] : data contains untransformed statistics 2022-07-22 13:32:35 ERROR [PEcAn.utils::transformstats] : data contains untransformed statistics 2022-07-22 13:32:35 INFO [query.trait.data] : Median c2n_leaf : 32.877 2022-07-22 13:32:35 INFO [query.trait.data] : --------------------------------------------------------- 2022-07-22 13:32:35 INFO [query.trait.data] : --------------------------------------------------------- 2022-07-22 13:32:35 INFO [query.trait.data] : SLA 2022-07-22 13:32:35 ERROR [PEcAn.utils::transformstats] : data contains untransformed statistics 2022-07-22 13:32:35 INFO [query.trait.data] : Median SLA : 15.713 2022-07-22 13:32:35 INFO [query.trait.data] : --------------------------------------------------------- 2022-07-22 13:32:35 INFO [query.trait.data] : --------------------------------------------------------- 2022-07-22 13:32:35 INFO [query.trait.data] : leaf_respiration_rate_m2 2022-07-22 13:32:35 INFO [query.trait.data] : Median leaf_respiration_rate_m2 : 1.015 2022-07-22 13:32:35 INFO [query.trait.data] : --------------------------------------------------------- 2022-07-22 13:32:35 INFO [query.trait.data] : --------------------------------------------------------- 2022-07-22 13:32:35 INFO [query.trait.data] : Vcmax 2022-07-22 13:32:35 ERROR [PEcAn.utils::transformstats] : data contains untransformed statistics 2022-07-22 13:32:35 INFO [query.trait.data] : Median Vcmax : 43.212 2022-07-22 13:32:35 INFO [query.trait.data] : --------------------------------------------------------- 2022-07-22 13:32:35 INFO [query.trait.data] : --------------------------------------------------------- 2022-07-22 13:32:35 INFO [query.trait.data] : quantum_efficiency 2022-07-22 13:32:35 ERROR [PEcAn.utils::transformstats] : data contains untransformed statistics 2022-07-22 13:32:35 INFO [query.trait.data] : Median quantum_efficiency : 0.052 2022-07-22 13:32:35 INFO [query.trait.data] : --------------------------------------------------------- 2022-07-22 13:32:35 INFO [FUN] : Number of observations per trait for PFT 'ebifarm.c3grass' : trait n 1 c2n_leaf 57 2 leaf_respiration_rate_m2 2 3 quantum_efficiency 6 4 SLA 19 5 Vcmax 3 2022-07-22 13:32:35 INFO [FUN] : Summary of prior distributions for PFT 'ebifarm.c3grass' : distn parama paramb n mort2 gamma 1.470 0.0578 0 growth_resp_factor beta 2.630 6.5200 0 fineroot2leaf lnorm 0.811 0.8430 0 root_turnover_rate weibull 1.670 0.6570 66 seedling_mortality beta 3.610 0.4330 0 Vcmax lnorm 4.510 0.6400 19 stomatal_slope lnorm 2.590 0.2600 11 r_fract beta 2.000 4.0000 0 c2n_leaf gamma 4.180 0.1300 95 root_respiration_rate weibull 2.660 6.2900 35 SLA weibull 2.060 19.0000 125 water_conductance lnorm -5.400 3.0000 0 quantum_efficiency weibull 3.320 0.0800 0 leaf_respiration_rate_m2 lnorm 0.632 0.6500 32 2022-07-22 13:32:35 DEBUG [FUN] : The following posterior files found in PFT outdir ( '/data/tests/ed2_testout/pft/ebifarm.c3grass' ) will be registered in BETY under posterior ID 9000001249 : 'prior.distns.csv', 'prior.distns.Rdata', 'species.csv', 'trait.data.csv', 'trait.data.Rdata' . The following files (if any) will not be registered because they already existed: ```

The first run registers 'prior.distns.csv', 'prior.distns.Rdata', 'species.csv', 'trait.data.csv', 'trait.data.Rdata' under posterior IDs 9000001246 and 9000001247 (one for each PFT), and the second run registers under IDs 9000001248 and 9000001249

Aariq commented 2 years ago

Actually, maybe the above comment is a separate bug? I'm not exactly sure what is supposed to happen here. Any insight @dlebauer?

Aariq commented 2 years ago

Ok, tracked this down a bit more. runModule.get.trait.data() is passing settings$meta.analysis$update to the forceupdate argument of get.trait.data(). If settings$meta.analysis$update is TRUE it will write to BETY, if anything else (e.g. "AUTO" or "FALSE") it will not. It does not check setting$database$bety$write. Is this the correct behavior?

mdietze commented 2 years ago

So I think you've hit on a bit of code that's given us trouble for a long time. In terms of desired behavior, the trait query and MA should NOT be running every time the workflow is run. The fact that it tends to has resulted in a massive overproliferation of Posteriors records, hugh numbers of which are virtually identical. In the early days of the project, when David had a whole team of folk populating the trait database it made more sense to update the posteriors more frequently, but at this point it should probably only occur when the user explicitly asks for an update (i.e. the default for forceupdate should be FALSE). The AUTO mode, which aimed to only run the MA when the data has changed, never did this correctly and tended to always run.

Aariq commented 2 years ago

Yeah, the get.trait.data() doesn't do anything with "AUTO", it converts anything other than "TRUE" to FALSE. But I'm confused about something---isn't the MA run by runModule.run.meta.analysis()? Why is get.trait.data() using the settings$meta.analysis$update at all? Also get.trait.data.pft() seems have code to print messages that indicate the MA is getting re-run, but I don't see where there is code in that function to actually run the meta analysis.

Aariq commented 2 years ago

IMO a function called get_* shouldn't write anything or do any analysis. None of this behavior is documented, which is partly why this is taking me so long to figure out.

Aariq commented 1 year ago

The short-sighted fix is to give get.trait.data.pft() a write argument and have it inherit that from runModule.get.trait.data(settings).

A maybe better solution is to have read.settings() store the write tag as an environment variable or an option and have all the relevant PEcAn.DB functions check for that option / env variable before doing anything.