Illumina / Cyrius

A tool to genotype CYP2D6 with WGS data
Other
47 stars 5 forks source link

underscores in diplotype calls #18

Closed mgonzalezporta closed 2 years ago

mgonzalezporta commented 2 years ago

Hi Xiao,

We've come across the following output from Cyrius where multiple solutions have been detected:

Sample  Genotype        Filter
SAM123  *119_*2;*1_*41  More_than_one_possible_genotype

Would you be able to comment on why the star alleles are separated by "_" instead of "/", even though there's only two alleles per solution?

Many thanks

xiao-chen-xc commented 2 years ago

Hi Mar, I would expect alleles to be separated by "/" here. Additionally, I would expect Cyrius to be able to distinguish between "1/41" and "119/*2". Based on these two expectations, I'm wondering if you were using the latest version of Cyrius?

mgonzalezporta commented 2 years ago

Hi Xiao,

This is for Cyrius v1.1.1. Attaching the complete json output in case it helps with the troubleshooting, cheers.

SAM3865.json.zip

xiao-chen-xc commented 2 years ago

Hi Mar, this sample's data is a little bit low quality. The coverage is relatively low. The coverage MAD (median absolute deviation) across the genome is a bit high, which means less uniform coverage probably due to the low depth. As a result, the copy number estimation in this sample is off (Cyrius called 3 total copies of D6+D7). If you plot out d67_snpraw, you can see that the CYP2D6/CYP2D7 CN estimation at each differentiating site is not close to integers, indicating that the total CN estimate is wrong (it should probably be 4 instead of 3). The genotype call is very low-confidence, so it didn't go through genotype reformatting and ended up with "" instead of "/" (though in theory we should make it "/", could be improved in the future). While I'm guessing this sample is probably 1/41, I almost think that we should make a no-call here, to distinguish the level of confidence in this scenario vs. one where the copy number call is confident (for example, 4 total copies of D6+D7, sample called as 1/46;43/45). In conclusion, this is an edge case with low coverage, and the call is low confidence (perhaps better called as a no-call).

mgonzalezporta commented 2 years ago

Many thanks Xiao, realised hadn't closed the ticket.