SBIMB / StellarPGx

Calling star alleles in highly polymorphic pharmacogenes (e.g. CYP450 genes) by leveraging genome graph-based variant detection.
MIT License
30 stars 6 forks source link

CYP2D6 call_stars terminated with an error exit status #4

Closed mgonzalezporta closed 2 years ago

mgonzalezporta commented 3 years ago

Describe the bug The call_stars process for CYP2D6 exits with error, resulting in a truncated cyp2d6.alleles file. Here the error log:

$ cat .command.err
Traceback (most recent call last):
  File "bin/stellarpgx.py", line 588, in <module>
    phased_dup = dup_test_cn_n(sv_dup, hap_dbs, snv_cand_alleles[0], snv_cand_alleles[1], snv_def_alleles[0], snv_def_alleles[1], cn, av_cov, in_list)
  File "/data/SG10K-CYP2D6/sandbox/stellarpgx/launchdir/StellarPGx/scripts/cyp2d6/hg38/bin/sv_modules.py", line 254, in dup_test_cn_n
    max_het = max(test_list2)
ValueError: max() arg is an empty sequence

$ cat WHB10006_cyp2d6.alleles
--------------------------------------------

CYP2D6 Star Allele Calling with StellarPGx

--------------------------------------------

Initially computed CN = 8

Sample core variants:
42126611~C>G~1/1;42130692~G>A~1/1

Candidate alleles:
['10.v1_10.v1']

Result:

The trigger seems to be the fact that there are no het variants genotyped for this sample, which makes test_list2 an empty list.

To Reproduce Here the command used to launch StellarPGx for the conflicting sample:

nextflow run main.nf -profile standard -c /data/SG10K-CYP2D6/sandbox/stellarpgx/WHB10006/nextflow.config --format compressed --build hg38 --gene cyp2d6 -resume

Expected behavior Our suggestion would be to issue a no call, as opposed to exiting with an error code.

twesigomwedavid commented 3 years ago

@mgonzalezporta Thanks for the useful suggestion. I will add the 'no call' assignment for such cases.

I see that the computed copy number for WHB10006 sample is 8 and the background alleles are CYP2D610/10. This individual is likely to have one of the more complex 36+10xN or 36xN+10xN alleles, especially since they are part of SG10K. We are still working on enhancements for StellarPGx to be able to call these more complex hybrid tandem rearrangements where CN>4. Given the complexity of CYP2D6 allele combinations in SG10K, I would suggest you use Aldy and Cyrius as well for your CYP2D6 allele calling and generate a consensus call set.

mgonzalezporta commented 3 years ago

Thanks for the quick response David. We are indeed inspecting the consensus across StellarPGx, Cyrius and Aldy. FYI, for the sample here Cyrius outputs a no call, and aldy reports 36 and 10 with multiple copies (*10+*10+*10+*10+*10+*10+*36.ALDY+*10/*36.ALDY+*10). Looking forward to the enhancement.

pmginacio commented 2 years ago

I have similar problems for CYP2D6 on multiple samples. I hope it is related to the initial post.

python3 bin/stellarpgx.py b37/diplo_db_debugged2.dbs V350029497_L02_MANsbdpX078095-736-1_vars/V350029497_L02_MANsbdpX078095-736-1_core_snvs.dip V350029497_L02_MANsbdpX078095-736-1_vars/V350029497_L02_MANsbdpX078095-736-1_full.dip V350029497_L02_MANsbdpX078095-736-1_vars/V350029497_L02_MANsbdpX078095-736-1_gt.dip b37/genotypes4.dbs V350029497_L02_MANsbdpX078095-736-1_gene_del/V350029497_L02_MANsbdpX078095-736-1_gene_del_summary.txt V350029497_L02_MANsbdpX078095-736-1_gene_dup/V350029497_L02_MANsbdpX078095-736-1_gene_dup_summary.txt V350029497_L02_MANsbdpX078095-736-1_cyp2d6_dp b37/haps_var_new.dbs b37/a_scores.dbs
--------------------------------------------

CYP2D6 Star Allele Calling with StellarPGx

--------------------------------------------

Initially computed CN = 15

Sample core variants:
42522613~G>C~0/1;42523943~A>G~0/1;42526763~C>T~0/1

Candidate alleles:
['1.v1_35.v1']

Result:
check

Activity score:
Traceback (most recent call last):
  File "bin/stellarpgx.py", line 670, in <module>
    ac_score = get_ac_score(act_score, gene_alleles)
  File "bin/stellarpgx.py", line 645, in get_ac_score
    p_allele = allele_dict[m_allele] + "_" + n_allele
KeyError: 'check'

The crash happens because function dup_test_cn_n runs to the end of the if-chain and returns 'check' which cannot be handled by get_ac_score().

A quick fix was to replace in line stellarogx.py:664

`if gene_alleles == "":
    ac_score = "Indeterminate"
    print(ac_score)

with

if gene_alleles in ["", "check"]:    
    ac_score = "Indeterminate"
    print(ac_score)

Is this appropriate, or does gene_alleles == 'check' demand a more informative output?

twesigomwedavid commented 2 years ago

@mgonzalezporta – Thanks again for opening this issue. I have addressed the error and 36xN+10xN calling with 3ab19a9

twesigomwedavid commented 2 years ago

@pmginacio – Thanks for the suggestion at stellarpgx.py:664. We had included the 'check' output to flag any samples that require visual/ in-depth inspection. For instance, the V350029497_L02_MANsbdpX078095-736-1 sample you are analysing seems to have CN=15 which is rare. May you try running it again with the new commit. Thanks.

twesigomwedavid commented 2 years ago

@mgonzalezporta and @pmginacio – Thanks again for pointing out these bugs. I am closing this issue – feel free to reopen it in case of anything else related to this.