Open ps-account opened 5 years ago
backtrace, it might be just a paired end issue
#0 0x00007ffff693c428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1 0x00007ffff693e02a in __GI_abort () at abort.c:89
#2 0x00007ffff74ae8f7 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3 0x00007ffff74b4a46 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4 0x00007ffff74b3aa9 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5 0x00007ffff74b4458 in __gxx_personality_v0 () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6 0x00007ffff6ce1573 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#7 0x00007ffff6ce1ad1 in _Unwind_RaiseException () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#8 0x00007ffff74b4ca7 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#9 0x0000000000695e7b in thrust::cuda_cub::throw_on_error (status=cudaErrorIllegalAddress,
msg=0x9f1f37 "device free failed") at /home/bla/local/cuda/cuda-10.0/include/thrust/system/cuda/detail/util.h:194
#10 0x00000000006a2196 in thrust::cuda_cub::free<thrust::cuda_cub::tag, thrust::device_ptr<void> > (ptr=...)
at /home/bla/local/cuda/cuda-10.0/include/thrust/system/cuda/detail/malloc_and_free.h:87
#11 0x00000000006a02c1 in thrust::free<thrust::cuda_cub::tag, thrust::device_ptr<void> > (exec=..., ptr=...)
at /home/bla/local/cuda/cuda-10.0/include/thrust/detail/malloc_and_free.h:78
#12 0x000000000069f3d2 in thrust::device_free (ptr=...)
at /home/bla/local/cuda/cuda-10.0/include/thrust/detail/device_free.inl:40
#13 0x000000000072b940 in thrust::device_malloc_allocator<unsigned int>::deallocate (this=0x7fff80d60ab0, p=...,
cnt=1572862) at /home/bla/local/cuda/cuda-10.0/include/thrust/device_malloc_allocator.h:148
#14 0x0000000000728eda in thrust::detail::allocator_traits<thrust::device_malloc_allocator<unsigned int> >::deallocate(thrust::device_malloc_allocator<unsigned int>&, thrust::device_ptr<unsigned int>, unsigned long)::workaround_warnings::deallocate(thrust::device_malloc_allocator<unsigned int>&, thrust::device_ptr<unsigned int>, unsigned long) (a=..., p=..., n=1572862)
at /home/bla/local/cuda/cuda-10.0/include/thrust/detail/allocator/allocator_traits.inl:257
#15 0x0000000000728f07 in thrust::detail::allocator_traits<thrust::device_malloc_allocator<unsigned int> >::deallocate (
a=..., p=..., n=1572862) at /home/bla/local/cuda/cuda-10.0/include/thrust/detail/allocator/allocator_traits.inl:261
#16 0x000000000072628c in thrust::detail::contiguous_storage<unsigned int, thrust::device_malloc_allocator<unsigned int> >::deallocate (this=0x7fff80d60ab0) at /home/bla/local/cuda/cuda-10.0/include/thrust/detail/contiguous_storage.inl:190
#17 0x0000000000725ee8 in thrust::detail::contiguous_storage<unsigned int, thrust::device_malloc_allocator<unsigned int> >::~contiguous_storage (this=0x7fff80d60ab0, __in_chrg=<optimized out>)
at /home/bla/local/cuda/cuda-10.0/include/thrust/detail/contiguous_storage.inl:64
#18 0x0000000000770fe8 in thrust::detail::vector_base<unsigned int, thrust::device_malloc_allocator<unsigned int> >::~vector_base (this=0x7fff80d60ab0, __in_chrg=<optimized out>)
at /home/bla/local/cuda/cuda-10.0/include/thrust/detail/vector_base.inl:497
---Type <return> to continue, or q <return> to quit---
#19 0x00000000007701aa in thrust::device_vector<unsigned int, thrust::device_malloc_allocator<unsigned int> >::~device_vector (this=0x7fff80d60ab0, __in_chrg=<optimized out>) at /home/bla/local/cuda/cuda-10.0/include/thrust/device_vector.h:78
#20 0x0000000000770854 in nvbio::vector<nvbio::device_tag, unsigned int>::~vector (this=0x7fff80d60ab0,
__in_chrg=<optimized out>) at /home/bla/local/nvBowtie-cuda10/nvbio/nvbio/basic/vector.h:113
#21 0x0000000000774d90 in nvbio::io::SequenceDataStorage<nvbio::device_tag>::~SequenceDataStorage (this=0x7fff80d60a00,
__in_chrg=<optimized out>) at /home/bla/local/nvBowtie-cuda10/nvbio/nvbio/io/sequence/sequence.h:436
#22 0x0000000000768221 in nvbio::bowtie2::cuda::ComputeThreadPE::do_run (this=0x37aee80)
at /home/bla/local/nvBowtie-cuda10/nvbio/nvBowtie/bowtie2/cuda/compute_thread.cu:597
#23 0x00000000007682b5 in nvbio::bowtie2::cuda::ComputeThreadPE::run (this=0x37aee80)
at /home/bla/local/nvBowtie-cuda10/nvbio/nvBowtie/bowtie2/cuda/compute_thread.cu:693
#24 0x0000000000678930 in nvbio::Thread<nvbio::bowtie2::cuda::ComputeThreadPE>::execute (arg=0x37aee80)
at /home/bla/local/nvBowtie-cuda10/nvbio/nvbio/basic/threads.h:116
#25 0x00007ffff7bc16ba in start_thread (arg=0x7fff80d61700) at pthread_create.c:333
#26 0x00007ffff6a0e41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
now with cuda-gdb. Another weird thing is this issue happens on Pascal but not on Maxwell. The finishing of the alignment kernel seems to be the issue
info : [0] aligning reads [168820736, 169869311]
verbose : [0] 1048576 reads
verbose : [0] 209.715 M bps (300.0 MB)
verbose : [0] 100.0 bps/read (min: 100, max: 100)
verbose : [0] 26.8 K reads/s
info : [0] aligning reads [169869312, 170758330]
verbose : [0] 889019 reads
verbose : [0] 177.764 M bps (254.3 MB)
verbose : [0] 100.0 bps/read (min: 100, max: 100)
CUDA Exception: Warp Out-of-range Address
The exception was triggered at PC 0x562ebd0
Thread 17 "nvBowtie" received signal CUDA_EXCEPTION_5, Warp Out-of-range Address.
[Switching focus to CUDA kernel 156, grid 433544, block (5731,0,0), thread (78,0,0), device 0, sm 18, warp 40, lane 14]
0x000000000562ebf0 in nvbio::bowtie2::cuda::detail::finish_alignment_kernel<nvbio::bowtie2::cuda::detail::BestTracebackStream<0u, nvbio::aln::GotohAligner<(nvbio::aln::AlignmentType)1, nvbio::bowtie2::cuda::SmithWatermanScoringScheme<nvbio::bowtie2::cuda::QualCost<int>, nvbio::bowtie2::cuda::ConstantCost<int> >, nvbio::aln::PatternBlockingTag>, nvbio::bowtie2::cuda::TracebackPipelineState<nvbio::bowtie2::cuda::SmithWatermanScoringScheme<nvbio::bowtie2::cuda::QualCost<int>, nvbio::bowtie2::cuda::ConstantCost<int> > > >, nvbio::bowtie2::cuda::SmithWatermanScoringScheme<nvbio::bowtie2::cuda::QualCost<int>, nvbio::bowtie2::cuda::ConstantCost<int> >, nvbio::bowtie2::cuda::TracebackPipelineState<nvbio::bowtie2::cuda::SmithWatermanScoringScheme<nvbio::bowtie2::cuda::QualCost<int>, nvbio::bowtie2::cuda::ConstantCost<int> > > ><<<(5734,1,1),(96,1,1)>>> ()
How to reproduce creating a truncated sam file from a small unpaired dataset, assuming you have installed nvbio:
# get arabidopsis from e.g. illumina igenome
wget ftp://igenome:G3nom3s4u@ussd-ftp.illumina.com/Arabidopsis_thaliana/Ensembl/TAIR10/Arabidopsis_thaliana_Ensembl_TAIR10.tar.gz
# unpack
tar -zxvf Arabidopsis_thaliana_Ensembl_TAIR10.tar.gz
cd Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/WholeGenomeFasta/
# create index
nvBWT -d 1 genome.fa genome-index
cd
# if you don't have it, download sra toolkit from https://www.ncbi.nlm.nih.gov/sra/docs/toolkitsoft/
~/sratoolkit.2.9.6-1-ubuntu64/bin/prefetch -v ERX3219973
~/sratoolkit.2.9.6-1-ubuntu64/bin/fastq-dump --outdir . --split-files $HOME/ncbi/public/sra/ERX3219973.sra
# now run nvBowtie
nvBowtie -x $HOME/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/WholeGenomeFasta/genome-index -1 ERX3219973_1.fastq -2 ERX3219973_2.fastq -S ERX3219973.bam
# make sure you have samtools installed, then run
samtools view ERX3219973.bam | tail -n1
[main_samview] truncated file.
ERX3219973.91 ST-J00101:86:HMYKLBBXX:7:1103:9709:41950 length=150 4 * 0 0 * * 0 0AACCGGTGAGACTTCCAATGATTGATTCAAATTAACTTCGAAGCTTCCATTTGTTCTTCACTTTGCTGACTGTGTTTATTGTTGGTTACAGGAAGGCAAGGACAATGTTAGAGTCATAGGTATTTTTCTTGACTTGTCTCAGATAAAGGG AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ
I get an error at the start of nvBowtie, could that be related?
info : nvBowtie... started
verbose : cuda devices : 1
verbose : device 0 has compute capability 6.1
verbose : SM count : 20
verbose : SM clock rate : 1733 Mhz
verbose : memory clock rate : 4.5 Ghz
verbose : chosen device 0
verbose : device name : Quadro P5000
verbose : compute capability : 6.1
visible : mapping reference index... started
info : file: "genome-index"
info : SequenceDataMMAP: error mapping file "/nvbio.genome-index.seq_info" (2)!
visible : mapping reference index... failed
visible : loading reference index... started
info : file: "genome-index"
visible : loading reference index... done
visible : FMIndexData: loading... started
visible : genome : genome-index
info : reading bwt... started
info : reading bwt... done
verbose : length: 119667750
info : building occurrence table... started
I have experienced the same problem. Stracing shows following error:
openat(AT_FDCWD, "/dev/shm/nvbio.hs37d5-index.seq_info", O_RDONLY|O_NOFOLLOW|O_CLOEXEC) = -1 ENOENT (No such file or directory)
EDIT: This error occurs when the shared memory is not running. So in my case I ran:
./nvFM-server hs37d5-index hs37d5 &
Running the code (also using vmiheer latest version) on a paired end dataset leads to a crash. gdb seems to indicate the "opposite alignment kernel" might be where things go wrong...