HighlanderLab / tree_seq_pipeline

Pipeline to infer tree sequences with different datasets
MIT License
3 stars 7 forks source link

Do we need a step to move non-bialellic SNPs from the VCF? #39

Open janaobsteter opened 10 months ago

janaobsteter commented 10 months ago

In the ancestral allele inference, we filter the variants for SNPs, but we do allow multi-allelic SNPs since that is not a problem for the ancestral inference (it's just an allele count). However, some rule might fall if there are multiallelic variants in the VCF and the INFO file (for example, the get_major rule in prepare_files_for_tsinfer.smk.

gregorgorjanc commented 10 months ago

We discussed in one of the meetings how we could convert multi-allelic sites to multiple bi-allelic sites. My understanding was that bcftools enables this conversion. Would that help?

janaobsteter commented 9 months ago

For now, let's filter the VCF to only keep biallelic SNPs to see how it behaves and then we can progress with adding other types of variants in the future.