SBIMB / StellarPGx

Calling star alleles in highly polymorphic pharmacogenes (e.g. CYP450 genes) by leveraging genome graph-based variant detection.
MIT License
30 stars 7 forks source link

Several errors in a large cohort #33

Open anh151 opened 10 months ago

anh151 commented 10 months ago

Hello, I've managed to run StellarPGx on thousands of samples. I did have to rewrite the nextflow pipeline in python because of restrictions in my environment. I have a set of 11 samples that are all returning the errors shown below.

Error 1. Occurs in 2 samples:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[55], line 1
----> 1 run_stellarpgx("filtered_bams/1992717.bam")

Cell In[33], line 2, in run_stellarpgx(file)
      1 def run_stellarpgx(file):
----> 2     run_command(["python3", "bin/stellarpgx.py", '-r', ref_genome, '-f', file, '-t', graphtyper, '-s', stellarpgx])

Cell In[2], line 5, in run_command(args, shell, return_output, timeout)
      3 def run_command(args, shell=False, return_output=False, timeout=None):
      4     stdout = subprocess.run(args, capture_output=True, text=True, shell=shell, timeout=timeout)
----> 5     check_stdout(stdout)
      6     if return_output:
      7         return stdout.stdout

Cell In[2], line 11, in check_stdout(stdout)
      9 def check_stdout(stdout):
     10     if stdout.returncode != 0:
---> 11         raise RuntimeError(f"Error running command {stdout.args} with error: {stdout.stderr}")

RuntimeError: Error running command ['python3', 'bin/stellarpgx.py', '-r', '/home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/lib/Homo_sapiens_assembly38.fasta', '-f', 'filtered_bams/1992717.bam', '-t', '/home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/graphtyper', '-s', '/home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/StellarPGx'] with error: Traceback (most recent call last):
  File "/home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/stellarpgx.py", line 164, in <module>
    call_stars()
  File "/home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/stellarpgx.py", line 108, in call_stars
    run_command(cmd, shell=True)
  File "/home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/stellarpgx.py", line 12, in run_command
    check_stdout(stdout)
  File "/home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/stellarpgx.py", line 17, in check_stdout
    raise RuntimeError(
RuntimeError: Error running command python3 /home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/StellarPGx/scripts/cyp2d6/hg38/bin/stellarpgx.py /home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/StellarPGx/database/cyp2d6/hg38/diplo_db_debugged2.dbs 1992717_vars/1992717_core_snvs.dip 1992717_vars/1992717_full.dip 1992717_vars/1992717_gt.dip /home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/StellarPGx/database/cyp2d6/hg38/genotypes4.dbs 1992717_gene_del/1992717_gene_del_summary.txt 1992717_gene_dup/1992717_gene_dup_summary.txt 1992717_cyp2d6_ctrl.depth /home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/StellarPGx/database/cyp2d6/hg38/haps_var_new.dbs /home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/StellarPGx/database/cyp2d6/hg38/a_scores.dbs > 1992717_cyp2d6.alleles with error: Traceback (most recent call last):
  File "/home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/StellarPGx/scripts/cyp2d6/hg38/bin/stellarpgx.py", line 307, in <module>
    phased_dup = dup_test_cn_3_4(sv_dup, hap_dbs, snv_cand_alleles[0], snv_cand_alleles[1], snv_def_alleles[0], snv_def_alleles[1], cn, av_cov, in_list)
  File "/home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/StellarPGx/scripts/cyp2d6/hg38/bin/sv_modules.py", line 172, in dup_test_cn_3_4
    index_var = hdb_list.index(var)
ValueError: '42127941~G>A' is not in list

Error 2. Occurs in 4 samples:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[47], line 1
----> 1 run_stellarpgx("filtered_bams/3044637.bam")

Cell In[33], line 2, in run_stellarpgx(file)
      1 def run_stellarpgx(file):
----> 2     run_command(["python3", "bin/stellarpgx.py", '-r', ref_genome, '-f', file, '-t', graphtyper, '-s', stellarpgx])

Cell In[2], line 5, in run_command(args, shell, return_output, timeout)
      3 def run_command(args, shell=False, return_output=False, timeout=None):
      4     stdout = subprocess.run(args, capture_output=True, text=True, shell=shell, timeout=timeout)
----> 5     check_stdout(stdout)
      6     if return_output:
      7         return stdout.stdout

Cell In[2], line 11, in check_stdout(stdout)
      9 def check_stdout(stdout):
     10     if stdout.returncode != 0:
---> 11         raise RuntimeError(f"Error running command {stdout.args} with error: {stdout.stderr}")

RuntimeError: Error running command ['python3', 'bin/stellarpgx.py', '-r', '/home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/lib/Homo_sapiens_assembly38.fasta', '-f', 'filtered_bams/3044637.bam', '-t', '/home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/graphtyper', '-s', '/home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/StellarPGx'] with error: Traceback (most recent call last):
  File "/home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/stellarpgx.py", line 164, in <module>
    call_stars()
  File "/home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/stellarpgx.py", line 108, in call_stars
    run_command(cmd, shell=True)
  File "/home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/stellarpgx.py", line 12, in run_command
    check_stdout(stdout)
  File "/home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/stellarpgx.py", line 17, in check_stdout
    raise RuntimeError(
RuntimeError: Error running command python3 /home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/StellarPGx/scripts/cyp2d6/hg38/bin/stellarpgx.py /home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/StellarPGx/database/cyp2d6/hg38/diplo_db_debugged2.dbs 3044637_vars/3044637_core_snvs.dip 3044637_vars/3044637_full.dip 3044637_vars/3044637_gt.dip /home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/StellarPGx/database/cyp2d6/hg38/genotypes4.dbs 3044637_gene_del/3044637_gene_del_summary.txt 3044637_gene_dup/3044637_gene_dup_summary.txt 3044637_cyp2d6_ctrl.depth /home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/StellarPGx/database/cyp2d6/hg38/haps_var_new.dbs /home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/StellarPGx/database/cyp2d6/hg38/a_scores.dbs > 3044637_cyp2d6.alleles with error: Traceback (most recent call last):
  File "/home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/StellarPGx/scripts/cyp2d6/hg38/bin/stellarpgx.py", line 326, in <module>
    test_68 = hybrid_test_68(sv_dup, cn, av_cov, cn_in1_3pr, in_list)
  File "/home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/StellarPGx/scripts/cyp2d6/hg38/bin/sv_modules.py", line 357, in hybrid_test_68
    index1 = test_list1.index('42130692~G>A')
ValueError: '42130692~G>A' is not in list

Error 3. Occurs in 3 samples:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[41], line 1
----> 1 run_stellarpgx("filtered_bams/2905389.bam")

Cell In[33], line 2, in run_stellarpgx(file)
      1 def run_stellarpgx(file):
----> 2     run_command(["python3", "bin/stellarpgx.py", '-r', ref_genome, '-f', file, '-t', graphtyper, '-s', stellarpgx])

Cell In[2], line 5, in run_command(args, shell, return_output, timeout)
      3 def run_command(args, shell=False, return_output=False, timeout=None):
      4     stdout = subprocess.run(args, capture_output=True, text=True, shell=shell, timeout=timeout)
----> 5     check_stdout(stdout)
      6     if return_output:
      7         return stdout.stdout

Cell In[2], line 11, in check_stdout(stdout)
      9 def check_stdout(stdout):
     10     if stdout.returncode != 0:
---> 11         raise RuntimeError(f"Error running command {stdout.args} with error: {stdout.stderr}")

RuntimeError: Error running command ['python3', 'bin/stellarpgx.py', '-r', '/home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/lib/Homo_sapiens_assembly38.fasta', '-f', 'filtered_bams/2905389.bam', '-t', '/home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/graphtyper', '-s', '/home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/StellarPGx'] with error: Traceback (most recent call last):
  File "/home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/stellarpgx.py", line 164, in <module>
    call_stars()
  File "/home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/stellarpgx.py", line 108, in call_stars
    run_command(cmd, shell=True)
  File "/home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/stellarpgx.py", line 12, in run_command
    check_stdout(stdout)
  File "/home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/stellarpgx.py", line 17, in check_stdout
    raise RuntimeError(
RuntimeError: Error running command python3 /home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/StellarPGx/scripts/cyp2d6/hg38/bin/stellarpgx.py /home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/StellarPGx/database/cyp2d6/hg38/diplo_db_debugged2.dbs 2905389_vars/2905389_core_snvs.dip 2905389_vars/2905389_full.dip 2905389_vars/2905389_gt.dip /home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/StellarPGx/database/cyp2d6/hg38/genotypes4.dbs 2905389_gene_del/2905389_gene_del_summary.txt 2905389_gene_dup/2905389_gene_dup_summary.txt 2905389_cyp2d6_ctrl.depth /home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/StellarPGx/database/cyp2d6/hg38/haps_var_new.dbs /home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/StellarPGx/database/cyp2d6/hg38/a_scores.dbs > 2905389_cyp2d6.alleles with error: Traceback (most recent call last):
  File "/home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/StellarPGx/scripts/cyp2d6/hg38/bin/stellarpgx.py", line 39, in <module>
    snv_def_calls = cand_snv_allele_calling(database, infile, infile_full, infile_full_gt, infile_spec, cn)
  File "/home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/StellarPGx/scripts/cyp2d6/hg38/bin/snv_def_modules.py", line 183, in cand_snv_allele_calling
    min_score = min(score)
ValueError: min() arg is an empty sequence

Error 4. Occurs in 2 samples:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[42], line 1
----> 1 run_stellarpgx("filtered_bams/1056081.bam")

Cell In[33], line 2, in run_stellarpgx(file)
      1 def run_stellarpgx(file):
----> 2     run_command(["python3", "bin/stellarpgx.py", '-r', ref_genome, '-f', file, '-t', graphtyper, '-s', stellarpgx])

Cell In[2], line 5, in run_command(args, shell, return_output, timeout)
      3 def run_command(args, shell=False, return_output=False, timeout=None):
      4     stdout = subprocess.run(args, capture_output=True, text=True, shell=shell, timeout=timeout)
----> 5     check_stdout(stdout)
      6     if return_output:
      7         return stdout.stdout

Cell In[2], line 11, in check_stdout(stdout)
      9 def check_stdout(stdout):
     10     if stdout.returncode != 0:
---> 11         raise RuntimeError(f"Error running command {stdout.args} with error: {stdout.stderr}")

RuntimeError: Error running command ['python3', 'bin/stellarpgx.py', '-r', '/home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/lib/Homo_sapiens_assembly38.fasta', '-f', 'filtered_bams/1056081.bam', '-t', '/home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/graphtyper', '-s', '/home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/StellarPGx'] with error: Traceback (most recent call last):
  File "/home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/stellarpgx.py", line 164, in <module>
    call_stars()
  File "/home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/stellarpgx.py", line 108, in call_stars
    run_command(cmd, shell=True)
  File "/home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/stellarpgx.py", line 12, in run_command
    check_stdout(stdout)
  File "/home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/stellarpgx.py", line 17, in check_stdout
    raise RuntimeError(
RuntimeError: Error running command python3 /home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/StellarPGx/scripts/cyp2d6/hg38/bin/stellarpgx.py /home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/StellarPGx/database/cyp2d6/hg38/diplo_db_debugged2.dbs 1056081_vars/1056081_core_snvs.dip 1056081_vars/1056081_full.dip 1056081_vars/1056081_gt.dip /home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/StellarPGx/database/cyp2d6/hg38/genotypes4.dbs 1056081_gene_del/1056081_gene_del_summary.txt 1056081_gene_dup/1056081_gene_dup_summary.txt 1056081_cyp2d6_ctrl.depth /home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/StellarPGx/database/cyp2d6/hg38/haps_var_new.dbs /home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/StellarPGx/database/cyp2d6/hg38/a_scores.dbs > 1056081_cyp2d6.alleles with error: Traceback (most recent call last):
  File "/home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/StellarPGx/scripts/cyp2d6/hg38/bin/stellarpgx.py", line 29, in <module>
    cn = get_total_CN(cov_file)[0]
  File "/home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/StellarPGx/scripts/cyp2d6/hg38/bin/sv_modules.py", line 30, in get_total_CN
    comp_av = av_2d6_cov/av_ctrl_cov
ZeroDivisionError: float division by zero

Thanks, Andrew

twesigomwedavid commented 10 months ago

@anh151,

Thanks for raising this. Unfortunately, as you can imagine it's very difficult for me to help without seeing the full version that you re-coded. I think the best solution is to request your admin to install the Singularity and then use our StellarPGx official versions. I am not sure this Github is really the best place to debug customised versions as that would be intractable on our end.

David

anh151 commented 10 months ago

Hey @twesigomwedavid

This analysis is in All of Us. We are attempting to call CYP2D6 on ~245k samples with several PGx callers to make the calls available to the rest of the researchers using the cohort. I talked to the program's support and data science team about installing singularity, but they couldn't do it because of security reasons. The only options remained were either not use StellarPGx, convert the nextflow pipeline to Cromwell or convert it to Python. Python seemed like the easiest and best option so that's what I went with. Attached is the python script used to run stellarpgx. I have been able to run it successfully on all samples except the 11 with the errors shown above. I have also confirmed that many of the samples have agreeing calls across multiple callers. If you are unable to address these errors then we can just report the 11 samples as Indeterminate. stellarpgx.py.zip

twesigomwedavid commented 10 months ago

@anh151,

I think the easier solution would have been to install the tools within the StellarPGx workflow separately. That way, no rewrite would have been needed (except disabling Singularity in the nextflow.config file). Is this still possible by any chance? Since you're running the tools using Python, I imagine that all of them are installed on your system already and can be called upon using Nextflow as the current StellarPGx version is set up to do.

Thanks

David

anh151 commented 10 months ago

@twesigomwedavid That is a better idea that I didn't think of. Now implemented your suggestion. I turned off all docker and singularity in the config file. Managed to run it successfully on a sample that should succeed and the call is correct. What information do you need to help you debug? Do you need the entire work directory for each sample or just the final call_stars function that is failing?

Versions just incase it's important. I can install other versions of anything, just wasn't sure what was needed. I just used what was preinstalled: bcftools: 1.12 tabix 1.10 graphtyper: 2.7.6 python: 3.10.12

Thanks, Andrew

twesigomwedavid commented 10 months ago

@anh151, It depends, are you still getting the same errors even after implementing this suggestion? If you are, then the more relevant info would be in the .command.err file in the work directory of the particular sample. Also, be sure to update to the latest StellarPGx version.

Kind regards, David

anh151 commented 10 months ago

@twesigomwedavid Thanks for all of the help.

Yes I am still getting the same errors. Attached are the errors for each of the samples. Yes I am using the latest version of StellarPGx. Let me know if you need any other info. errors.zip

Thanks, Andrew

twesigomwedavid commented 10 months ago

@anh151,

I think it might be best to set up a Zoom call to discuss this rather than going back and forth over Github.

Let's set up a time via direct email. My time zone is GMT+2

David

anh151 commented 10 months ago

Hi David, This error is a bit unrelated and I can create a new issue if needed.

In main.nf, shouldn't call_sv_del and call_sv_dup have ${cram_options} for the graphtyper commands? Without it I get the error below because it's trying to access a different reference genome.

Error executing process > 'call_sv_dup (1)'

Caused by:
  Process `call_sv_dup (1)` terminated with an error exit status (1)
Command executed:

  graphtyper genotype_sv lib/Homo_sapiens_assembly38.fasta --sam=wgs_1056081.cram --region=chr22:42126000-42137500 --output
=wgs_1056081_sv_dup res_hg38/sv_test3.vcf.gz

Command exit status:
  1

Command output:
  (empty)
Command error:
  [W::find_file_url] Failed to open reference "https://www.ebi.ac.uk/ena/cram/md5/ac37ec46683600f808cdd41eac1d55cd": Protoc
ol not supported
  [E::cram_get_ref] Failed to populate reference for id 21
  [E::cram_decode_slice] Unable to fetch reference #21 42109689..42137150

  [E::cram_next_slice] Failure to decode slice
  [2024-01-23 22:24:08.524] <error> hts_reader.cpp:252 htslib failed BAM/CRAM reading of wgs_1056081.cram and returned -2

Work dir:
  /home/jupyter/workspaces/pharmacogenomichaplotypecharacterization/bin/StellarPGx/work/ff/2405f069aee4498acf0a6e725024da

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

Thanks, Andrew