broadinstitute / lrma-aou1-panel-creation

Pipelines and evaluations covering integration, phasing, and imputation of short and structural variants for the AoU Phase 1 long-reads callset.
1 stars 0 forks source link

Double check filters before Shapeit4. #23

Open samuelklee opened 1 month ago

samuelklee commented 1 month ago

Currently just MAC >= 2 to drop singletons, it's possible that missingness, etc. as in similar 1kG pipelines could be useful.

samuelklee commented 1 month ago

This also includes checking that Shapeit4 respects relevant filters from GLnexus, which has its own method for resolving inconsistent haplotypes; see the "Hybrid allelic representation" section in the GLnexus paper (https://www.biorxiv.org/content/10.1101/343970v1.full.pdf) and note the MONOALLELIC filter, which will yield overlapping alleles if not dropped.

samuelklee commented 1 month ago

Actually we should double check any SV filters (in current or future VCFs) as well.

hangsuUNC commented 1 month ago

Previously we tried "F_MISSING < 0.05 & MAC >=2". It seems a little stringent as the recall of small variants drops a lot. Should we try "F_MISSING<0.01"? For MONOALLELIC filtering, I remember it is in the Filter column. Should we keep these MONOALLELIC sites say filter-i 'FILTER=="PASS" || FILTER=="MONOALLELIC"'?