awilfert / PSAP-pipeline

14 stars 9 forks source link

Variant in *multianno.txt but not in report #5

Closed JonathanRios1 closed 7 years ago

JonathanRios1 commented 7 years ago

Is there a reason why a variant that is in the VCF file and in the *multiannot.txt would not be in the report.txt output? Could it be differences between annotating with Ensembl versus Refseq?

Thanks.

Jonathan

awilfert commented 7 years ago

Hi @JonathanRios1,

Not all of the variants output in the *_multianno file will be output in the report.txt file. There are several reasons for this:

1) The variant is removed in one of the several cleaning steps implemented prior to calculating the PSAP p-values. These cleaning steps remove possible artifacts/false positive variant calls, variants in regions that were not considered when generating our null models, and various sources of missing information. Variants removed at this step can be found in the *_missing.txt file.

2) The PSAP p-values are calculated using a test statistic. This test statistic considers only the most pathogenic variant in the gene under the REC-hom and DOM-het disease models, and the second most pathogenic variant under the REC-chet model (this variant is paired with the DOM-het variant in the report file).

3) Prior to generating the report.txt file, the PSAP annotated variants undergo a validation step and variants will no be reported if they are present in unaffected individuals in a genotype that is not compatible with the disease model under which the variant is being considered. These variants will be present in the *_popStat.txt file, but not in the report.txt file.

Hope this helps clear things up!