Open achauffou opened 3 years ago
First results for pol_binom_03: I have not made any plot worth showing here yet, but I have explored a little bit the Stan outcome. II have uploaded the rstan-fit object to polybox. There are a few things that I noticed:
Regarding predictability: My initial idea on how to analyse preditability was to compute the ROC/AUC of the entire model, but also to compute separately AUC/ROC separately for each site/plant/pollinator (including only its data points). Then, in my mind, it would have been possible to compare AUCs/ROCs of the different sites/plants/pollinators to see if there are some that are more or less predictable...
When I compute the overall AUC of the model, it falls around 0.84 (which is extremely close to the simulation). I find that somehow weird. Even weirder, when computing the AUC of some sites individually (not all yet), they end up with very similar AUC (of about 0.77). Most likely, I have maybe done a mistake at some point in the code or my reasoning because it is unlikely that we get a predictability as high as with simulated data... I will go over it once again and try to think of what might be the problem.
Some plots for the results of pol_binom_03: param_post_alpha.pdf param_post_beta.pdf param_post_gamma_pla.pdf param_post_gamma_pol.pdf param_post_lambda.pdf param_post_lambda_bar.pdf param_post_sigma_beta.pdf param_post_sigma_gamma_pla.pdf param_post_sigma_gamma_pol.pdf param_post_sigma_lambda.pdf auc.pdf roc.pdf
Looking at these plots, I noticed that lambdas all have wide confidence intervals and that there is not much difference between sites. The easy way out would be to say that the effect of bioclimatic suitability is universal and does not depend on location but I think it these results come more likely from a lack of signal (maybe this effect is difficult to estimate in the presence of other parameters). I have a few reflections on this point:
I think we should meet for this.
Before the meeting though, can you do a couple of things:
Thanks for the feedback and suggestion, I will do that first thing tomorrow (I have emailed you to set a time for a meeting)
Below are some plots for the models you suggested. Sorry for the ugly and not very useful plots for gamma_pla and gamma_pol, I should probably sample only a few illustrative parameters. I will keep working and compare their WAIC this afternoon.
pol_binon_02: The one with most datapoints, no lambda slope. param_post_alpha.pdf param_post_beta.pdf param_post_gamma_pla.pdf param_post_gamma_pol.pdf param_post_sigma_beta.pdf param_post_sigma_gamma_pla.pdf param_post_sigma_gamma_pol.pdf
pol_binom_04: With a single lambda for all sites. param_post_alpha.pdf param_post_beta.pdf param_post_gamma_pla.pdf param_post_gamma_pol.pdf param_post_lambda.pdf param_post_sigma_beta.pdf param_post_sigma_gamma_pla.pdf param_post_sigma_gamma_pol.pdf
pol_binom_05: No gammas, site-specific lambdas. param_post_alpha.pdf param_post_beta.pdf param_post_lambda.pdf param_post_lambda_bar.pdf param_post_sigma_beta.pdf param_post_sigma_lambda.pdf
pol_binom_06: No gammas, single lambda for all sites. param_post_alpha.pdf param_post_beta.pdf param_post_lambda.pdf param_post_sigma_beta.pdf
Cool, there is a lot that we can learn from this. All these models tell part of the story and, on Friday, we will need to define what the next steps are. I'll try to give some thought on these steps before the meeting, but we will most likely need to brainstorm together about them (be prepared to think big!).
A few thoughts:
Awesome, your ideas are super interesting and I am looking forward to discuss them more tomorrow. Until then I put here where I am at now regarding your thoughts:
I will keep you updated until tomorrow.
A quick comment regarding 1. I was calculating the log-likelihood values for models with 200000 points, and it worked fine (just very heavy files). I don't think it will be that much of a problem in your case. Also, the generated quantities block only runs at the end of the sampling, so you do not need to parallelise that (it should not take that much longer).
Just a quick update about my latest struggles regarding the pollination models...
After including duplicates as replicates of a binomial distribution as you suggested in your email, I have performed all the pollination analyses once again. However, this time the diagnostics were not as nice as last time (although diagnostics of simulations are fine):
Great thanks for the advice
Results of the first pollination models: I am currently running on the real dataset the two following models:
I will post here some results and thoughts on interpretation as well as any issue I encounter.