Correct genetic data module

ChloeRN commented 1 year ago

There had been some misunderstandings about the genetic data which led to an inappropriate data likelihood being included in the model previously. Specifically, probabilities of being resident vs. immigrants were confused, and it also seems like using a Bernouilli trial directly on the p values that are output by GeneClass 2 may be inappropriate. We have to correct the error, and also compare some different approaches (e.g. Binomial likelihood with a threshold instead of Bernouilli, rescaling p-Values, changing likelihood assumptions in GeneClass 2 etc.).

ChloeRN commented 1 year ago

I have tested and compared many different models and summarised the observations in details in the attached file. immR_ModelTests_Notes.txt

ChloeRN commented 1 year ago

My main conclusions based on the comparisons are as follows:

We will not further pursue the second type of GeneClass analysis (log L Varanger / log L). This because resulting model estimates are basically identical to the first GeneClass analysis (Log L Varanger / Log L Max) but models take MUCH longer to run.
We will also not further pursue any ideas of calculating "immigrant probabilities" post-hoc from relative log-likelihoods. We have no theory to support such an approach, and resulting numbers seem to be little related to results from GeneClass analysis.
For models with a summed likelihood, we will continue using a threshold of 0.2 to distinguish immigrants from locals. This seems sensible as there is a "natural gap" in the data there, while 0.1 seems like a much more arbitrary choice .
Using an individual likelihood with the p-values as output by GeneClass likely overestimates immigration rates and consequently underestimates harvest mortality and denning survival. Based on this, it may seem better to use a summed likelihood or perhaps rescale p-values from GeneClass prior to analysis.
Estimates from 0.2 threshold summed likelihood models are very similar to those from rescaled p-value individual likelihood models. The former tnet to have somewhat higher precision for population-level estimates than the latter.
Models using yearly as opposed to pooled genetic data predict much more stable immigration (rate) over time. POpulation patterns remain largely the same though through "compensation" via increased temporal variability in natural mortality and (to a lesser degree) litter size.
Covariate effects are estimated similarly across different models. Median effect sizes are largely the same, while uncertainty (esp. for effects on immigration rate and natural mortality) is higher for models using yearly genetic data. The only effect with a different mean/median across models is the reindeer:rodent interaction on natural mortality, which is predicted stronder in the model using yearly as opposed to pooled genetic data (at least for the summed likelihood case).

ChloeRN commented 1 year ago

Moving forward, I suggest the following:

The "main" model will employ a summed likelihood with a threshold of 0.2 and analysing either pooled OR yearly genetic data. Gravitating towards the former as we then need to make less of an assumption regarding age at immigration (at least for the genetic data)
All post-hoc analyses (LTRE, scenarios) should also be run with the equivalent using the other data (i.e. pooled/yearly, depending on which is used in main model). This is important because I think variance partitioning is a bit different in the two approaches.
In the Appendix (and code) we will also present a comparison of out main models (summed likelihood with 0.2 threshold) to the models using rescaled individual likelihoods and to the naive model.
In the discussion, we have to pick up some potential pit-falls with assumptions we are making such as time-matching / age at immigration and representativeness of genetic data.

ChloeRN commented 1 year ago

The "final" immigration model comparison for the manuscript Appendix is in the attached pdfs. PosteriorDensities.pdf PosteriorSummaries_TimeSeries.pdf

ChloeRN / VredfoxIPM

Correct genetic data module #36