SBIMB / StellarPGx

Calling star alleles in highly polymorphic pharmacogenes (e.g. CYP450 genes) by leveraging genome graph-based variant detection.
MIT License
30 stars 7 forks source link

Process `call_stars (SIM001)` terminated with an error exit status (1) #1

Closed roman-tremmel closed 3 years ago

roman-tremmel commented 3 years ago

NOTE: Process call_stars (SIM001) terminated with an error exit status (1) -- Error is ignored

> nextflow run main.nf -profile standard,test
N E X T F L O W  ~  version 21.04.0
Launching `main.nf` [jolly_nightingale] - revision: 1af3facac9
executor >  local (11)
[84/f83671] process > call_snvs1 (1)        [100%] 1 of 1 ✔
[ee/c102c5] process > call_snvs2 (1)        [100%] 1 of 1 ✔
[e0/3c3ef2] process > call_sv_del (1)       [100%] 1 of 1 ✔
[dc/9230a9] process > call_sv_dup (1)       [100%] 1 of 1 ✔
[d6/f25d49] process > get_depth (1)         [100%] 1 of 1 ✔
[78/228db3] process > format_snvs (1)       [100%] 1 of 1 ✔
[70/2b8244] process > get_core_var (SIM001) [100%] 1 of 1 ✔
[de/5fa45c] process > analyse_1 (SIM001)    [100%] 1 of 1 ✔
[e3/eab42a] process > analyse_2 (SIM001)    [100%] 1 of 1 ✔
[5f/c0465a] process > analyse_3 (SIM001)    [100%] 1 of 1 ✔
[a4/23752c] process > call_stars (SIM001)   [100%] 1 of 1, failed: 1 ✔
[a4/23752c] NOTE: Process `call_stars (SIM001)` terminated with an error exit status (1) -- Error is ignored

The error log in the corresponding work dir a4 says:

Traceback (most recent call last):
  File "bin/stellarpgx.py", line 29, in <module>
    cn = get_total_CN(cov_file)[0]
  File "/home/tremmel/StellarPGx/scripts/cyp2d6/hg38/bin/sv_modules.py", line 24, in get_total_CN
    av_2d7_ex2_in8 = float(all_reg[9][3])/(float(all_reg[9][2]) - float(all_reg[9][1]))
IndexError: list index out of range

The results dir contains only a vcf file SIM001_cyp2d6.vcf.gz

I used

Is there a distinct Python version required?

twesigomwedavid commented 3 years ago

Hi @roman-tremmel,

Thanks for spotting this. I am so sorry for the late response. I have updated the config file specific for the test run (See commit d86af90) to match the recent enhancements to the main.nf script.

The Python version shouldn't be an issue as all dependencies are included in the Singularity/Docker container.

ajpar94 commented 2 years ago

Hi @twesigomwedavid ,

I have the same issue as the one described above (call_stars fails; Error is ignored). For me, it is caused by a different line in the same python function (get_total_CN(cov_file)).

The corresponding error log shows:

Traceback (most recent call last):
  File "bin/stellarpgx.py", line 28, in <module>
    cn = get_total_CN(cov_file)[0]
  File "/root/ajit/StellarPGx/scripts/cyp2d6/b37/bin/sv_modules.py", line 25, in get_total_CN
    av_2d7_ex2_in8 = float(all_reg[10][3])/(float(all_reg[10][2]) - float(all_reg[10][1]))
IndexError: list index out of range

I ran nextflow run main.nf -profile standard --build hg19 --gene cyp2d6 (after editing the nextflow.config to define the location of the reference genome and the bam files) with docker.

Any idea what is causing this error and how to fix it?

twesigomwedavid commented 2 years ago

Hi @ajpar94,

The issue could be because your reference genome does not match the reference to which the reads are aligned in the BAM file.

Could you confirm the naming of the contigs in your reference genome vs the ones in the BAM header? Check if there is chr or not.

Thanks

ajpar94 commented 2 years ago

Thanks @twesigomwedavid for getting back to me so quickly.

Yes, both the BAM file and reference genome use chr . I actually manually changed the header lines in the reference genome to use that prefix, i.e. I changed >1 dna:chromosome chromosome:GRCh37:1:1:249250621:1 to >chr1 dna:chromosome chromosome:GRCh37:1:1:249250621:1 (for the other chromosomes, too.)

After that I generated a new fa.fai file.

twesigomwedavid commented 2 years ago

@ajpar94 No worries.

That's a bit strange. Usually the contigs are just supposed to be >chr1 up to chr22 (plus alternates if applicable) or simply >1 to >22. I think the fact that you have the long string i.e. dna:chromosome chromosome:GRCh37:1:1:249250621:1 is causing the issue.

StellarPGx is coded to expect standard GRCh37 (b37 and hg19), or GRCh38 contigs.

Do the BAM files also have chr1 dna:chromosome chromosome:GRCh37:1:1:249250621:1 or simply chr1 etc?

ajpar94 commented 2 years ago

The BAM file header looks like this:

@HD     VN:1.5  SO:coordinate
@SQ     SN:chr1 LN:249250621
@SQ     SN:chr2 LN:243199373
@SQ     SN:chr3 LN:198022430
...
twesigomwedavid commented 2 years ago

Exactly what I thought.

Your reference file also needs to have contigs named chr1, chr2, ..., chr22 rather than the long >chr1 dna:chromosome chromosome:GRCh37:1:1:249250621:1

i.e.

>chr1
NNNN...
...
...
>chr2
NNNN...
...
...
etc
ajpar94 commented 2 years ago

Unfortunately, that did not solve it. I still get the same error

Traceback (most recent call last):
  File "bin/stellarpgx.py", line 28, in <module>
    cn = get_total_CN(cov_file)[0]
  File "/root/ajit/StellarPGx/scripts/cyp2d6/b37/bin/sv_modules.py", line 25, in get_total_CN
    av_2d7_ex2_in8 = float(all_reg[10][3])/(float(all_reg[10][2]) - float(all_reg[10][1]))
IndexError: list index out of range
twesigomwedavid commented 2 years ago

Are you using whole genome sequence data for your analysis? If so, what's the coverage

It might be that some regions are missing read coverage in your sample(s)

ajpar94 commented 2 years ago

I was using whole exome sequencing data. In my case, defining custom regions within resources/cyp2d6/cyp_hg19/test3.bed has solved the issue. Thank you for your help!

twesigomwedavid commented 2 years ago

@ajpar94, thanks for the information on this. Please note that we haven't yet validated the use of exome data as input for StellarPGx as of version 1.2.6. Therefore, using WES would likely result in erratic star allele calling especially for CYP2D6.