SBIMB / StellarPGx

Calling star alleles in highly polymorphic pharmacogenes (e.g. CYP450 genes) by leveraging genome graph-based variant detection.
MIT License
30 stars 7 forks source link

StellarPGx v1.2.5 incorrectly calls GeT-RM samples NA10847, NA19920 and NA19908 that were correctly called by the original version #24

Closed VasilisFlouris closed 10 months ago

VasilisFlouris commented 1 year ago

Hello there,

while testing StellarPGx v1.2.5 on CYP2D6-calling on samples from GeT-RM I encountered something unexpected. On your publication on StellarPGx, samples NA10847, NA19920 and NA19908 are correctly called as 1/41, 1/4x2 and 1/46 respectively. However v1.2.5 incorrectly calls these samples as 39/41, 1/4 and 43/45 respectively.

The BAM files for these samples were downloaded from the ENA website and used as is.

Stellar was called as:

nextflow run main.nf -dsl1 -profile standard --build b37 --gene cyp2d6 

For NA10847 the output was:

 --------------------------------------------

CYP2D6 Star Allele Calling with StellarPGx

--------------------------------------------

Initially computed CN = 2

Sample core variants:
42523805~C>T~0/1;42523943~A>G~0/1

Candidate alleles:
['39.v1_41.v1']

Result:
*39/*41

Activity score:
1.5

Metaboliser status:
Normal metaboliser (NM)

For NA19920:

--------------------------------------------

CYP2D6 Star Allele Calling with StellarPGx

--------------------------------------------

Initially computed CN = 3

Sample core variants:
42522613~G>C~0/1;42523507~A>C~0/1;42523943~A>G~1/1;42524947~C>T~0/1;42526694~G>A~0/1

Candidate alleles:
[1.v1_4.v2]

Result:
Possible novel allele or suballele present: interpret with caution; experimental validation and expert review through PharmVar is recommended

Likely background alleles:
[*1/*4]

Activity score:
Indeterminate

Metaboliser status:
Indeterminate

For NA19908:

--------------------------------------------

CYP2D6 Star Allele Calling with StellarPGx

--------------------------------------------

Initially computed CN = 2

Sample core variants:
42522613~G>C~0/1;42523943~A>G~0/1;42525077~C>T~0/1;42526717~C>T~0/1

Candidate alleles:
['1.v1_46.v1', '43.v1_45.v1']

Result:
*43/*45

Activity score:
1.5

Metaboliser status:
Normal metaboliser (NM)

Is this an error on my end?

Thank you.

twesigomwedavid commented 1 year ago

@VasilisFlouris, Thanks for raising this. It's not an error on your side. Some updates to the StellarPGx version we published in CPT have (inevitably) affected a few calls.

For NA10847, the 42522613G>C variant is called in the initial steps. However, it is subsequently filtered out because it has a Quality by Depth (QD) = 8.7. The current StellarPGx version considers variants with QD ≥ 10. If the QD filter in the main.nf file is relaxed to say 5, then the allele call will be 1/41 as per the original call. However, relaxing thresholds could also affect calls in other samples (could be non-GETRM) as this region (and the CYP2D locus in general) is prone to artefactual variants resulting from read misalignments across CYP2D6 and CYP2D7.

For NA19920, the copy number is still correctly identified. However, the current StellarPGx version flags 42523507~A>C as a potential novel core variant. This variant is supported by a QD = 23.5 and AD=21/85 (not so uncommon for samples with a duplication). Therefore, that's why StellarPGx flagged that there is a potential novel star allele. Of course, this would need experimental validation and again highlights challenges relating to variant calling in this region.

For the NA19908, the quick tests I've run point to some subvariants being filtered out according to the more stringent cutoffs in the current StellarPGx version. These were used to discriminate 1/46 from 43/45 variant combinations in the first version.