SBIMB / StellarPGx

Calling star alleles in highly polymorphic pharmacogenes (e.g. CYP450 genes) by leveraging genome graph-based variant detection.
MIT License
29 stars 6 forks source link

Checking SV calls for CYP2D6 #6

Open mgonzalezporta opened 2 years ago

mgonzalezporta commented 2 years ago

We've been running the CYP2D6 genotyper on NA12878, with outputs as follows:

tree cyp2d6/
cyp2d6/
├── alleles
│   └── NA12878_cyp2d6.alleles
└── variants
    ├── NA12878_cyp2d6.vcf.gz
    └── NA12878_cyp2d6.vcf.gz.tbi

The provided genotype is *3/*68+*4 and matches the expected. However, when checking the variant calls that support each allele, we've seen that the output VCF only contains evidence for SNVs and INDELs. Where can we find equivalent outputs for SVs?

Thanks in advance.

twesigomwedavid commented 2 years ago

Hi @mgonzalezporta,

Thanks for raising this. For a hybrid such as CYP2D6*68 to be called, the conditions defined in stellarpgx.py and sv_modules.py have to be met regarding the read depth and allele depth/ratio of key SNVs (e.g. rs1065852 in this case).

As you know, hybrids such as CYP2D6*68, *13, and *36 are very difficult to represent in VCF format. However, you're right that we should make it clear as to how StellarPGx arrives at these SV calls. We'll explore the option of capturing this in a more verbose file e.g. a log file or equivalent.