SciLifeLab / TIDDIT

TIDDIT - structural variant calling
Other
65 stars 13 forks source link

tiddit_variant error #106

Open islekburak opened 1 year ago

islekburak commented 1 year ago

I used small .bam files to call SVs but the process is aborted everytime. (I have been succeeded for just once). Last time I used Manta's example .bam & reference.fasta files (https://github.com/Illumina/manta/tree/master/src/demo/data) to do it, I received the error that I mention below. On the other hand, I should add that I used the TIDDIT-3.6.0 version. How can we solve this problem, or what is the reason of that issue, could you please enlighten me?

Run Code

$ tiddit --sv --bam /Users/islekbro/Desktop/SV/bams/hg19/MANTA/G15512.HCC1954.1.COST16011_region.bam -o /Users/islekbro/Desktop/SV/deneme3/manta_out --ref /Users/islekbro/Desktop/SV/genomes/manta.fa --fermi2 /Users/islekbro/miniconda3/bin/fermi2 --ropebwt2 /Users/islekbro/miniconda3/bin/ropebwt2

Error

LIBRARY STATISTICS Pair orientation = Forward-Reverse Average Read length = 101.0 Average insert size = 322.6496907216495 Stdev insert size = 57.08645970234663 99.95 percentile insert size = 581.020000000015

Collecting signals on contig: 8 Collecting signals on contig: 11 ('total', 0.3015010356903076) Writing signals to file extracted signals in: -0.3036017417907715 calculated coverage in: 7.124830961227417 [M::main_ropebwt2] inserted 129948 symbols in 0.229 sec, 0.021 CPU sec [M::main_ropebwt2] constructed FM-index in 0.229 sec, 0.021 CPU sec [M::main_ropebwt2] symbol counts: ($, A, C, G, T, N) = (1274, 38888, 25449, 25449, 38888, 0) [M::main_ropebwt2] rld: (tot, $, A, C, G, T, N) = (129948, 1274, 38888, 25449, 25449, 38888, 0) [M::main] Version: r187 [M::main] CMD: /Users/islekbro/miniconda3/bin/ropebwt2 -dNCr /Users/islekbro/Desktop/SV/deneme3/manta_out_tiddit/clips.fa [M::main] Real time: 0.234 sec; CPU: 0.029 sec [M::fm6_unitig] choose prime 123457 [M::main] Version: r178 [M::main] CMD: /Users/islekbro/miniconda3/bin/fermi2 assemble -t 1 -l 81 - [M::main] Real time: 0.309 sec; CPU: 0.078 sec Clip read assembly in: 0.570451021194458 generated clusters in 0.0017549991607666016 Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.8/bin/tiddit", line 11, in load_entry_point('tiddit', 'console_scripts', 'tiddit')() File "/Users/islekbro/TIDDIT/tiddit/main.py", line 167, in main variants=tiddit_variant.main(bam_file_name,sv_clusters,args,library,min_mapq,samples,coverage_data,contig_number,max_ins_len) File "tiddit/tiddit_variant.pyx", line 546, in tiddit.tiddit_variant.main File "<__array_function__ internals>", line 180, in percentile File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/numpy/lib/function_base.py", line 4166, in percentile return _quantile_unchecked( File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/numpy/lib/function_base.py", line 4424, in _quantile_unchecked r, k = _ureduce(a, File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/numpy/lib/function_base.py", line 3725, in _ureduce r = func(a, *kwargs) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/numpy/lib/function_base.py", line 4593, in _quantile_ureduce_func result = _quantile(arr, File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/numpy/lib/function_base.py", line 4699, in _quantile take(arr, indices=-1, axis=DATA_AXIS) File "<__array_function__ internals>", line 180, in take File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 190, in take return _wrapfunc(a, 'take', indices, axis=axis, out=out, mode=mode) File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 57, in _wrapfunc return bound(args, **kwds)

gavinmonahan commented 5 months ago

Hi, I'm getting a similar error, pasted below.

I am running the most recent version of TIDDIT through nf-core/sarek (singularity). I have also tried installing through conda, upgrading python (3.8 -> 3.10), changing number of threads (1 -> 6), and copying files rather than using sym links, but got the same error. It seems the error is consistantly caused by the step preceding the numpy percentile call, throwing a "Mean of empty slice" error. It was reported here #107 and here nf-core/sarek/issues/1441.

My files are human cram files, aligned using DRAGMAP to T2T, and approximately 15 to 40gb.

Any help would be greatly appreciated!

Command:

tiddit \
    --threads 6 \
    --sv \
    --skip_assembly \
    --bam /scratch/pawsey0933/T2T/sarek_align/preprocessing/markduplicates/D15-1048/D15-1048.md.cram \
    --ref /scratch/pawsey0933/T2T/reference/chm13v2.0.maskedY.rCRS.EBV.fasta \
    -o /scratch/pawsey0933/gmonahan/D15-1048.tiddit2

Error log:

/software/projects/pawsey0933/gmonahan/envs/tiddit/lib/python3.10/site-packages/numpy/lib/function_base.py:520: RuntimeWarning: Mean of empty slice.
  avg = a.mean(axis, **keepdims_kw)
/software/projects/pawsey0933/gmonahan/envs/tiddit/lib/python3.10/site-packages/numpy/core/_methods.py:129: RuntimeWarning: invalid value encountered in scalar divide
  ret = ret.dtype.type(ret / rcount)
/software/projects/pawsey0933/gmonahan/envs/tiddit/lib/python3.10/site-packages/numpy/core/_methods.py:206: RuntimeWarning: Degrees of freedom <= 0 for slice
  ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
/software/projects/pawsey0933/gmonahan/envs/tiddit/lib/python3.10/site-packages/numpy/core/_methods.py:163: RuntimeWarning: invalid value encountered in divide
  arrmean = um.true_divide(arrmean, div, out=arrmean,
/software/projects/pawsey0933/gmonahan/envs/tiddit/lib/python3.10/site-packages/numpy/core/_methods.py:198: RuntimeWarning: invalid value encountered in scalar divide
  ret = ret.dtype.type(ret / rcount)
Traceback (most recent call last):
  File "/software/projects/pawsey0933/gmonahan/envs/tiddit/bin/tiddit", line 11, in <module>
    sys.exit(main())
  File "/software/projects/pawsey0933/gmonahan/envs/tiddit/lib/python3.10/site-packages/tiddit/__main__.py", line 127, in main
    library=tiddit_stats.statistics(bam_file_name,args.ref,min_mapq,max_ins_len,n_reads)
  File "/software/projects/pawsey0933/gmonahan/envs/tiddit/lib/python3.10/site-packages/tiddit/tiddit_stats.py", line 48, in statistics
    library["percentile_insert_size"]=numpy.percentile(insert_size, 99.9)
  File "/software/projects/pawsey0933/gmonahan/envs/tiddit/lib/python3.10/site-packages/numpy/lib/function_base.py", line 4283, in percentile
    return _quantile_unchecked(
  File "/software/projects/pawsey0933/gmonahan/envs/tiddit/lib/python3.10/site-packages/numpy/lib/function_base.py", line 4555, in _quantile_unchecked
    return _ureduce(a,
  File "/software/projects/pawsey0933/gmonahan/envs/tiddit/lib/python3.10/site-packages/numpy/lib/function_base.py", line 3823, in _ureduce
    r = func(a, **kwargs)
  File "/software/projects/pawsey0933/gmonahan/envs/tiddit/lib/python3.10/site-packages/numpy/lib/function_base.py", line 4722, in _quantile_ureduce_func
    result = _quantile(arr,
  File "/software/projects/pawsey0933/gmonahan/envs/tiddit/lib/python3.10/site-packages/numpy/lib/function_base.py", line 4831, in _quantile
    slices_having_nans = np.isnan(arr[-1, ...])
IndexError: index -1 is out of bounds for axis 0 with size 0
J35P312 commented 5 months ago

Hello! And sorry for slow reply! I have been traveling during easter! This error indicates that tiddit could not find any paired reads. I have never run tiddit on DRAGMAP, so I suggest that is the issue. Could you send me a small bam file (a few thousand reads), I could use that to solve the issue.

Best regards Jesper

gavinmonahan commented 5 months ago

Hey Jesper, Thanks for looking into it! Here is a small cram file, which returned the same error: D15-1048.md.small.zip The reference I used is the broad chm13v2.0.maskedY.rCRS.EBV.fasta here&prefix=&forceOnObjectsSortingFiltering=false). Thank you 🙂 Gavin

J35P312 commented 5 months ago

Hello! And thanks for the cram file! I have now had a look! I was able to replicate the error, TIDDIT was unnable to find any paired reads. I checked the cram using samtools, and it seems all reads are unaligned:

samtools flagstat D15-1048.md.small.cram

80000 + 0 in total (QC-passed reads + QC-failed reads) 80000 + 0 primary 0 + 0 secondary 0 + 0 supplementary 0 + 0 duplicates 0 + 0 primary duplicates 0 + 0 mapped (0.00% : N/A) 0 + 0 primary mapped (0.00% : N/A) 80000 + 0 paired in sequencing 40000 + 0 read1 40000 + 0 read2 0 + 0 properly paired (0.00% : N/A) 0 + 0 with itself and mate mapped 0 + 0 singletons (0.00% : N/A) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5)

I ran a read through blast and it seems to be human data, maybe there was an issue with the alignment? Can you check your cram file, are there any aligned reads? Meanwhile I will implement a nicer error message in TIDDIT!

Best regards Jesper

gavinmonahan commented 5 months ago

Well that would explain it! Thanks for looking into it - I'll have to work out what happened now...