Sendrowski / fastDFE

Fast and flexible inference of the distribution of fitness effects (DFE), VCF-SFS parsing with ancestral allele and site-degeneracy annotation.
https://fastdfe.readthedocs.io
GNU General Public License v3.0
11 stars 0 forks source link

Joint inference with fixed parameters #11

Open AudeCaizergues opened 3 hours ago

AudeCaizergues commented 3 hours ago

Hello,

I am trying to run a joint inference from the SFS of 2 different populations and I have some questions regarding the process.

  1. From the nature of my data and my questions, I only want to estimate purifying selection occurring in my 2 populations. I understand that we can constrain parameters to do so such as p_b=0 so that S can't be positive. In the manual however (section DFE inference > Base inference > Fixing parameters), I read that "We also force the DFE to be purely deleterious, by fixing S_b to some arbitrary value and setting p_b to 0". I understand why we fix p_b to 0 but I wonder why we necessarily have to fix S_b=1 (in this case) to achieve the estimation of only deleterious mutations. Could you please elaborate on that ?

  2. I found no example combining joint inference with fixing parameters so I code my script like that (where eps is shared and p_b fixed to 0, I'll add S_b=1 if need be):

inf = fd.JointInference(
    sfs_neut=sfs_neut,
    sfs_sel=sfs_sel,
    shared_params=[fd.SharedParams(types=["urb", "rur"], params=["eps"])], #share the sames ancestral allele misidentification rate
    fixed_params=dict(all=dict(p_b=0)), # fixed so we only infer purif selection
    do_bootstrap=True
)

Is it the proper way to (visually) test some difference in selection between my populations ?

  1. In addition with plotting the results of selection from my SFS, I would like to formally test if my patterns of selection in each population are significantly different from each other (e.g., if purifying selection is stronger in one of the populations). I see the function "inf.perform_lrt_shared()" which in your example gives a Pvalue = 0.451 (I guess it's a Pvalue at least), and you mention that "The test is not significant, indicating that the simpler model of sharing the parameters explains the data sufficiently well. Indeed, we do not observe a lot of differences between the inferred parameters of joint and the marginal inferences."

When I run this test on my data I get:

>>> inf.perform_lrt_shared()
INFO:JointInference: Simple model likelihood: -235.27340578049512, complex model likelihood: -235.14460149932972, degrees of freedom: 1.
0.6117678851908046

But I have a hard time figuring out what it means in terms of eps and p_b and hesitate between two interpretations :

Thank you in advance for your help,

Aude

Sendrowski commented 2 hours ago

Hi Aude,

  1. The idea is to fix S_b to an arbitrary value to avoid optimizing an unneeded (and unidentifiable) parameter when fixing p_b to 0.
  2. Yes, I think that should do the trick. In addition, I would also fix S_b to an arbitrary value here, and of course check if they were indeed held fixed by inspecting JointInference.params_mle.
  3. The likelihood of the more complex model (with more parameters) is obtained by summing up the log likelihoods of the marginal component spectra (types "urb" and "rur" in your case). You can obtain the marginal inference objects by accessing JointInference.marginal_inferences which is a dictionary of BaseInference objects for the different types. (You can ignore the "all" entry which is the BaseInference results for the sum of all component spectra.) The marginal inference inherit the fixed parameters from JointInference (which can you verify by checking JointInference.marginal_inferences[type].fixed_params), so what you're effectively testing is whether eps varies between rural and urban populations (which might not be so informative after all). If you want to test whether purifying selection differs, you would share S_d or b or both between your types.

I hope this helps Janek