gaow / SEQLinkage

Collapsed Haplotype Pattern Method for Linkage Analysis of Next-Generation Sequencing Data
MIT License
7 stars 6 forks source link

Is it reasonable to use marker freq as the population frequency of the disease allele? #38

Open changebio opened 2 years ago

changebio commented 2 years ago

In paramlink2, we need to set the dfreq for diseaseModel. For example: dm = diseaseModel(chrom = "AD", penetrances = c(0,1,1), dfreq = 1e-5).

gaow commented 2 years ago

dfreq = population frequency of the disease allele? Then yes the marker freq should be the population frequency of the markers

changebio commented 2 years ago

In common variant linkage analysis, How about the variants with high frequency, such as 0.4113 or 0.3247.

Screen Shot 2022-03-02 at 10 07 36 AM
gaow commented 2 years ago

Common variant is a different context where your marker allele is not necessarily the disease allele (unobserved) ... It seems safe to set it to a low number -- take a look at Figure 1 of this paper, and in the discussion section about setting it to 0.01 for dominate gene and 0.1 for recessive.

gaow commented 2 years ago

Also this review paper has a formula to estimate disease allele frequency using penetrance for disease, for phenocopies and prevalence.

changebio commented 2 years ago

Common variant is a different context where your marker allele is not necessarily the disease allele (unobserved) ... It seems safe to set it to a low number -- take a look at Figure 1 of this paper, and in the discussion section about setting it to 0.01 for dominate gene and 0.1 for recessive.

It looks like 0.01 is a common setting for dominate model.

changebio commented 2 years ago

Also this review paper has a formula to estimate disease allele frequency using penetrance for disease, for phenocopies and prevalence.

In Figure 1 of the review article, why are common variants (0.05) removed for family-based whole-genome sequencing analysis?

changebio commented 2 years ago

In APOE gene, I compared the models by using different frequencies. The conclusion is that the lower the frequency, the higher LOD score. The full line without circle dot is the result from the actual frequency. (the x-axis is from 0 to 0.45) image

gaow commented 2 years ago

In Figure 1 of the review article, why are common variants (0.05) removed for family-based whole-genome sequencing analysis?

This is a typical protocol for filter based variant discovery for Mendelian diseases. The variants to be identified usually have large penetrance. If penetrance is high and disease variants are common, the disease will no longer be rare Mendelian. That's why these variants are removed in the beginning.

gaow commented 2 years ago

I compared the models by using different frequencies. The conclusion is that the lower the frequency, the higher LOD score.

could you clarify what you meant by "frequency"? the "disease frequency" we discussed? What's on the Y-axis -- are they LOD scores?

changebio commented 2 years ago

I compared the models by using different frequencies. The conclusion is that the lower the frequency, the higher LOD score.

could you clarify what you meant by "frequency"? the "disease frequency" we discussed? What's on the Y-axis -- are they LOD scores?

The frequency means the disease frequency. Y-axis is the sum of LOD scores among different families.

changebio commented 2 years ago

In Figure 1 of the review article, why are common variants (0.05) removed for family-based whole-genome sequencing analysis?

This is a typical protocol for filter based variant discovery for Mendelian diseases. The variants to be identified usually have large penetrance. If penetrance is high and disease variants are common, the disease will no longer be rare Mendelian. That's why these variants are removed in the beginning.

So it is not necessary to do common variant linkage analysis for family data? The analysis should focus on rare variants?

changebio commented 2 years ago

The Figure I showed is based on common variants (maf>0.05).