Closed abigailsnyder closed 4 years ago
per @claudiatebaldi would expect jointly estimated and jointly sampled.
@abigailsnyder will
data_raw
to jointly estimate - 24 parameters to get a joint multivariable beta distribution of T and P fractions. And add more documentation to those functions. Then go back into the monthly_downscaling
code and update sampling to be joint, as well as adding options outlined in https://github.com/JGCRI/an2month/issues/16
In terms of updating the sampling to be joint, it looks like the separate sampling for each variable is happening in the cassandra components code: https://github.com/JGCRI/cassandra/blob/master/cassandra/components.py Lines 964-977
Which explains why it's harder to tell from the R monthly_downscaling
sampling code that the variables are being treated separately than in the data_raw/...
training code.
So the R code will have to be updated for the sampling but then the cassandra code will also have to be updated, FYI @crvernon
@claudiatebaldi @kdorheim In trying to work through the code in more depth for doing this enhancement https://github.com/JGCRI/an2month/issues/16,
I've done more careful, line by line combing through the nested functions in
data_raw/L3_fit_dirichlet_params.R
anddata_raw/jobrun.zsh
. I think that the code is estimating the parameters of a multivariable beta distribution for the temperature data, and a separate set of parameters for the precipitation data. At least I think.I didn't catch it in my initial trying to learn the
an2month
package, I think because of how the functions are nested. And because I think that approach of treating T and P separately is different from the very early notes I had contributing to figuring out what the sampling should look like (around Dec 2018) and then I wasn't involved in the actual work. And then so many issues came up with how fldgen was being called in the pipeline, I didn't return to this until last week/this week.So do we want to keep T and P separate the way they're implemented, or do we want to estimate 24 parameters together (like I initially thought was happening)? Also thoughts on continuing to use a multivariate beta distribution?