fritzsedlazeck / Sniffles

Structural variation caller using third generation sequencing
Other
546 stars 91 forks source link

process cannot end with warnings #317

Open maxineliu opened 2 years ago

maxineliu commented 2 years ago

python 3.7.9 sniffles 2.0.6

I run sniffles with BAM+CSI index on cluster. I don't know why the process is always stuck at ”building of index for XXX failed“. Here is the commands and reactions below:

module load python/3.7
virtualenv --no-download p37
source p37/bin/activate
pip install --no-index --upgrade pip
pip install ~/sniffles-2.0.6-py3-none-any.whl

sniffles --input /home/maxine91/scratch/sorted_align_bam.dir/aln.sorted.cy201704.bam \
    --vcf cy201704.vcf.gz \
    --snf cy201704.snf \
    --tandem-repeats /home/maxine91/projects/def-jfu/data/bufo_genome/01.repeat_annotation/Trf.bed \
    --reference /home/maxine91/projects/def-jfu/data/bufo_genome/genome.fa \
    -t 31

reactions:

Running Sniffles2, build 2.0.6
  Run Mode: call_sample
  Start on: 2022/05/04 23:40:41
  Working dir: /scratch/maxine91/call.dir
  Used command: /home/maxine91/p37/bin/sniffles --input /home/maxine91/scratch/sorted_align_bam.dir/aln.sorted.cy201704.bam --vcf cy201704.vcf.gz --snf cy201704.snf --tandem-repeats /home/maxine91/projects/def-jfu/data/bufo_genome/01.repeat_annotation/Trf.bed --reference /home/maxine91/projects/def-jfu/data/bufo_genome/genome.fa -t 31
==============================
Opening for reading: /home/maxine91/scratch/sorted_align_bam.dir/aln.sorted.cy201704.bam
Opening for reading: /home/maxine91/projects/def-jfu/data/bufo_genome/01.repeat_annotation/Trf.bed (tandem repeat annotations for 746 contigs)
Opening for reading: /home/maxine91/projects/def-jfu/data/bufo_genome/genome.fa
Opening for writing: cy201704.vcf.gz (single-sample, sorted, bgzipped, tabix-indexed)
Opening for writing: cy201704.snf
Info: 746 of 747 contigs in the input sample have associated tandem repeat annotations.

Analyzing 54732154 alignments total...

 3452674/54732154 alignments processed (6%, 15013/s); 737/747 tasks done; parallel 10/31; 457309 candidates. 99616 SVs. 
54732154/54732154 alignments processed (100%, 34536/s); 747/747 tasks done; parallel 0/31; 7416274 candidates. 1740031 SVs. 
Took 1584.74s.

WARNING: Unable to assign call at original_scaffold_2172_pilon:-8 to unambiguous task. (got 0 intervals). SVCall=SVCall(contig='original_scaffold_2172_pilon', pos=-8, id='DEL.1B4S131', ref='N', alt='<DEL>', qual=58, filter='GT', info={'STDEV_POS': 9.451631252505216, 'STDEV_LEN': 47.57099956906519, 'AF': 0.1282051282051282}, svtype='DEL', svlen=-1039, end=1031, genotypes={0: (0, 0, 44, 34, 5, None)}, precise=False, support=5, rnames=None, qc=True, nm=-1, postprocess=None, fwd=2, rev=3, coverage_upstream=None, coverage_downstream=43, coverage_start=None, coverage_center=37, coverage_end=41)
ERROR: 1 calls ignored, but only 0 were reassigned to correct tasks
Generating index for cy201704.vcf.gz...
[E::hts_idx_check_range] Region 872550..587513178 cannot be stored in a tbi index. Try using a csi index with min_shift = 14, n_lvls >= 6
Traceback (most recent call last):
  File "/home/maxine91/p37/bin/sniffles", line 613, in <module>
    Sniffles2_Main(config.from_cmdline(),processes)
  File "/home/maxine91/p37/bin/sniffles", line 588, in Sniffles2_Main
    pysam.tabix_index(config.vcf,preset="vcf",force=True)
  File "pysam/libctabix.pyx", line 1035, in pysam.libctabix.tabix_index
OSError: building of index for cy201704.vcf.gz failed

The process always stacked on "building of index for cy201704.vcf.gz failed". And the process will not end either, unless it is manually forced to quit.

So, my questions are,

  1. Do the warning (WARNING: Unable to assign call at original_scaffold_2172_pilon:-8 to unambiguous task. (got 0 intervals). ) and ERROR (1 calls ignored, but only 0 were reassigned to correct tasks) matter?
  2. Why does the index generate failed?
  3. Is the stacking of process because of index building failed?

Thanks very much for helping.

Maxine

wdecoster commented 2 years ago

The index is built with tabix, and its default is a TBI index, which cannot work with some chromosome lengths. Since you are already using a CSI index for your bam that indicates you probably have such long chromosomes.

Relevant line of code: https://github.com/fritzsedlazeck/Sniffles/blob/385da00913c7a2b9988a008a99bc7976476ee0ac/src/sniffles/sniffles#L588

That line could be changed with csi=True to fix your particular problem, but I am unsure if the developers want to do this by default (for all users, also those for which a TBI is fine), using a flag (with --csi option on the command line) or with a try-except: if building a TBI index fails, try to build a CSI index. @fritzsedlazeck if you could use some help with implementing either of those solutions then let me know

maxineliu commented 2 years ago

The index is built with tabix, and its default is a TBI index, which cannot work with some chromosome lengths. Since you are already using a CSI index for your bam that indicates you probably have such long chromosomes.

Relevant line of code:

https://github.com/fritzsedlazeck/Sniffles/blob/385da00913c7a2b9988a008a99bc7976476ee0ac/src/sniffles/sniffles#L588

That line could be changed with csi=True to fix your particular problem, but I am unsure if the developers want to do this by default (for all users, also those for which a TBI is fine), using a flag (with --csi option on the command line) or with a try-except: if building a TBI index fails, try to build a CSI index. @fritzsedlazeck if you could use some help with implementing either of those solutions then let me know

Thank you for help! Would pysam.tabix_index(config.vcf,preset="vcf",force=True) change to pysam.tabix_index(config.vcf,preset="vcf",force=True,csi=True)?

I run sniffles again, the warning and error seems as same as the prior one:

Running Sniffles2, build 2.0.6
  Run Mode: call_sample
  Start on: 2022/05/05 14:49:18
  Working dir: /scratch/maxine91/call.dir
  Used command: /home/maxine91/p37/bin/sniffles --input /home/maxine91/scratch/sorted_align_bam.dir/aln.sorted.cy201704.bam --vcf cy201704.vcf.gz --snf cy201704.snf --tandem-repeats /home/maxine91/projects/def-jfu/data/bufo_genome/01.repeat_annotation/Trf.bed --reference /home/maxine91/projects/def-jfu/data/bufo_genome/genome.fa -t 31
==============================
Opening for reading: /home/maxine91/scratch/sorted_align_bam.dir/aln.sorted.cy201704.bam
Opening for reading: /home/maxine91/projects/def-jfu/data/bufo_genome/01.repeat_annotation/Trf.bed (tandem repeat annotations for 746 contigs)
Opening for reading: /home/maxine91/projects/def-jfu/data/bufo_genome/genome.fa
Opening for writing: cy201704.vcf.gz (single-sample, sorted, bgzipped, tabix-indexed)
Opening for writing: cy201704.snf
Info: 746 of 747 contigs in the input sample have associated tandem repeat annotations.

Analyzing 54732154 alignments total...

 3452674/54732154 alignments processed (6%, 13482/s); 737/747 tasks done; parallel 10/31; 457309 candidates. 99616 SVs. 
54732154/54732154 alignments processed (100%, 31429/s); 747/747 tasks done; parallel 0/31; 7416274 candidates. 1740031 SVs. 
Took 1741.44s.

WARNING: Unable to assign call at original_scaffold_2172_pilon:-8 to unambiguous task. (got 0 intervals). SVCall=SVCall(contig='original_scaffold_2172_pilon', pos=-8, id='DEL.1B4S131', ref='N', alt='<DEL>', qual=58, filter='GT', info={'STDEV_POS': 9.451631252505216, 'STDEV_LEN': 47.57099956906519, 'AF': 0.1282051282051282}, svtype='DEL', svlen=-1039, end=1031, genotypes={0: (0, 0, 44, 34, 5, None)}, precise=False, support=5, rnames=None, qc=True, nm=-1, postprocess=None, fwd=2, rev=3, coverage_upstream=None, coverage_downstream=43, coverage_start=None, coverage_center=37, coverage_end=41)
ERROR: 1 calls ignored, but only 0 were reassigned to correct tasks
Generating index for cy201704.vcf.gz...
[E::hts_idx_check_range] Region 872550..587513178 cannot be stored in a tbi index. Try using a csi index with min_shift = 14, n_lvls >= 6
Traceback (most recent call last):
  File "/home/maxine91/p37/bin/sniffles", line 613, in <module>
    Sniffles2_Main(config.from_cmdline(),processes)
  File "/home/maxine91/p37/bin/sniffles", line 588, in Sniffles2_Main
    pysam.tabix_index(config.vcf,preset="vcf",force=True,csi=True)
  File "pysam/libctabix.pyx", line 1035, in pysam.libctabix.tabix_index
OSError: building of index for cy201704.vcf.gz failed
wdecoster commented 2 years ago

Hmm, searching more about this brought me to https://github.com/pysam-developers/pysam/issues/995 That bug is fixed in pysam 0.17.0. Can you see which one you have?

maxineliu commented 2 years ago

Hmm, searching more about this brought me to pysam-developers/pysam#995 That bug is fixed in pysam 0.17.0. Can you see which one you have?

pysam version in my environment is 0.16.0.1. So if I use pysam 0.17.0+, should I inclue csi=True in line 588?

wdecoster commented 2 years ago

Yes, and then I expect it to actually build a CSI index, and not ignore that argument as in v0.16

maxineliu commented 2 years ago

Yes, and then I expect it to actually build a CSI index, and not ignore that argument as in v0.16

hooray! The CSI index generate successfully! There is just one question left. The warning:

WARNING: Unable to assign call at original_scaffold_2172_pilon:-8 to unambiguous task. (got 0 intervals). SVCall=SVCall(contig='original_scaffold_2172_pilon', pos=-8, id='DEL.1B4S131', ref='N', alt='<DEL>', qual=58, filter='GT', info={'STDEV_POS': 9.451631252505216, 'STDEV_LEN': 47.57099956906519, 'AF': 0.1282051282051282}, svtype='DEL', svlen=-1039, end=1031, genotypes={0: (0, 0, 44, 34, 5, None)}, precise=False, support=5, rnames=None, qc=True, nm=-1, postprocess=None, fwd=2, rev=3, coverage_upstream=None, coverage_downstream=43, coverage_start=None, coverage_center=37, coverage_end=41)
ERROR: 1 calls ignored, but only 0 were reassigned to correct tasks

Should I do something to avoid it happenning?

wdecoster commented 2 years ago

That seems like something for @smolkmo