Open samuelklee opened 1 month ago
This also includes checking that Shapeit4 respects relevant filters from GLnexus, which has its own method for resolving inconsistent haplotypes; see the "Hybrid allelic representation" section in the GLnexus paper (https://www.biorxiv.org/content/10.1101/343970v1.full.pdf) and note the MONOALLELIC
filter, which will yield overlapping alleles if not dropped.
Actually we should double check any SV filters (in current or future VCFs) as well.
Previously we tried "F_MISSING < 0.05 & MAC >=2". It seems a little stringent as the recall of small variants drops a lot. Should we try "F_MISSING<0.01"? For MONOALLELIC
filtering, I remember it is in the Filter column. Should we keep these MONOALLELIC sites say filter-i 'FILTER=="PASS" || FILTER=="MONOALLELIC"'
?
Currently just MAC >= 2 to drop singletons, it's possible that missingness, etc. as in similar 1kG pipelines could be useful.