Closed mdshw5 closed 6 years ago
Just to rule out issues with MKL, I edited the Makefile and removed MKL inclusions and macros. Same issue:
+ /home/shirlma2/.local/bin/jellyfish count -m 27 -s 100M -t 1 -Q 5 RNA-seq/reads_final.1.fastq -o RNA-seq/reads.1.jf
+ /home/shirlma2/.local/bin/jellyfish dump RNA-seq/reads.1.jf -o RNA-seq/reads.1.fa
+ /home/shirlma2/.local/bin/jellyfish count -m 27 -s 100M -t 1 -Q 5 RNA-seq/reads_final.2.fastq -o RNA-seq/reads.2.jf
+ /home/shirlma2/.local/bin/jellyfish dump RNA-seq/reads.2.jf -o RNA-seq/reads.2.fa
+ /home/shirlma2/.local/bin/freePSI build -k 27 -p 1 -g ./genome -a ./annotation/hg38_refGene_exonBoundary_chr21.bed -1 RNA-seq/reads.1.fa -2 RNA-seq/reads.2.fa -o ./hashtable.json
### Start to build theoretical and real kmer profile ...
*** Start to load exon boundary annotation ...
=== There are 316 genes from 1 chromosomes
(If the number of chromosomes is not right, please check whether the annotation is sorted.)
*** Finish loading exon boundary annotation !
Elasped time 0s.
*** Start to build theoretical kmer profile ...
+++ Build theoretical kmer profile from chr21 ...
*** Finish building theoretical kmer profile!
Elasped time 2s.
*** Start to load reads from Jellyfish...
=== Totally 13338528 kmers loaded.
=== Reserve 13338528 high quality Kmers. (Reserve 100%)
=== 219914 kmers failed to match with theoretical kmer profile. (Reserve 98.3513%)
=== Finally collect 1321343 different possible kmers on Genome.
=== Finally collect 326534 different kmers occur in read.
=== About 40 occurrences of kmer on average.
*** Finish loading kmers from reads!
Elasped time 2s.
*** Start to merge kmers which share same profile ...
=== 994809 kmers not occur in reads. (75.2877%).
=== 0 kmers occur less than 0 times in reads. (0%).
=== 319387 kmers share same profile. (24.1714%).
+++ Rehash kmer profile ...
KmerCount load factor: Now 0.950778, before 0.00534304
KmerTable load factor: Now 0.950778, before 0.00333197
=== 7147 kmers reserve after merging!
KmerCountSize: 7147
KmerTableSize: 7147
(Shrinkage rate is 0.540889%).
=== 2324 kmers are originated from unique exon. (32.5171%).
*** Finish merging kmer profile !
Elasped time 1s.
### Finish building theoretical and real kmer profile!
### Finished!
Elasped time 6s.
CPU Time elapsed: 6s.
Natural Time elapsed: 6s.
+ /home/shirlma2/.local/bin/freePSI quant -k 27 -p 1 -i ./hashtable.json -o .
*** Eigen mode:
--- Use intrinsic parallel in Eigen
--- Not use Intel MKL in Eigen
Elasped time 0s.
### Start to initialize parameters ...
*** Start to initialize variable indices ...
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> OK!
*** Finish initializing variable indices!
Elasped time 0s.
*** Start to initialize coefficients ...
+++ Start to initialize effective length ...
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> OK!
+++ Finish initializing effective length!
+++ Start to initialize contribution matrix and kmer count vector ...
> OK!
+++ Finish initializing contribution matrix and kmer count vector!
*** Finish initializing coefficients!
Elasped time 0s.
*** Start to set initial values for variables (Gamma, Theta and Mu)...
+++ Start to allocate kmer count as initial value ...
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> OK!
+++ Finish allocating kmer count as initial value ...
=== On average, 5.1416 exons or juctions share one kmer.
=== 2.01483% kmers are unique to exons in large exon number gene.
=== 38.6076% elements of Z (316 in total) are zero.
=== 63.3393% elements of X (26437 in total) are zero.
=== 1171 elements of X (2398 in total) are zero.
=== 2.21519% genes (7 in total) contain more than 40 exons
+++ Start to do normalization ...
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> OK!
+++ Finish doing normalization!
*** Finish setting initial values for variables (Gamma, Alpha and Mu)!
Elasped time 0s.
*** Start to initialize constraint matrix ...
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>OK!
*** Finish initializing constraint matrix!
Elasped time 0s.
*** Start to make Alpha feasible...
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>OK!
+++ Finish making Alpha feasible!
=== Prune 21.6179% infeaible variables (infeasible junctions) ...
=== 0 new empty genes are found
=== Now there are 7926 variables. (Before 10112)
*** Finish initializing CGPA optimizer and making Alpha feasible...
Elasped time 1s.
### Finish initializing parameters !
Elasped time 1s.
### Start to estimate parameters ...
*** Start to run EM algorithm ...
*** M-step mode:
--- Dont run M-step concurrently
OK!
=== Initial log-likelihood is -6.56896e+07
+++ 1 iteration processed...
+++ E-step ...
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> OK!
Elasped time 0s.
Thread number = 1
+++ M-step ...
Eigen threads = 12
Thread num = 1
>*** glibc detected *** /home/shirlma2/.local/bin/freePSI: free(): invalid next size (normal): 0x00000000026487b0 ***
======= Backtrace: =========
/lib64/libc.so.6[0x3b16275e66]
/lib64/libc.so.6[0x3b162789ba]
/home/shirlma2/.local/bin/freePSI[0x42d45e]
/home/shirlma2/.local/bin/freePSI[0x432659]
/home/shirlma2/.local/bin/freePSI[0x4118e4]
/home/shirlma2/.local/bin/freePSI[0x41583a]
/home/shirlma2/.local/bin/freePSI[0x40940a]
/home/shirlma2/.local/bin/freePSI[0x40a003]
/home/shirlma2/.local/bin/freePSI[0x403bda]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x3b1621ed5d]
/home/shirlma2/.local/bin/freePSI[0x403e31]
======= Memory map: ========
00400000-0044d000 r-xp 00000000 00:1e 1826571594 /home/shirlma2/.local/bin/freePSI
0044d000-0044e000 r--p 0004c000 00:1e 1826571594 /home/shirlma2/.local/bin/freePSI
0044e000-0044f000 rw-p 0004d000 00:1e 1826571594 /home/shirlma2/.local/bin/freePSI
01e4c000-0455d000 rw-p 00000000 00:00 0 [heap]
3b15a00000-3b15a20000 r-xp 00000000 09:01 1202416 /lib64/ld-2.12.so
3b15c1f000-3b15c20000 r--p 0001f000 09:01 1202416 /lib64/ld-2.12.so
3b15c20000-3b15c21000 rw-p 00020000 09:01 1202416 /lib64/ld-2.12.so
3b15c21000-3b15c22000 rw-p 00000000 00:00 0
3b15e00000-3b15e83000 r-xp 00000000 09:01 1202422 /lib64/libm-2.12.so
3b15e83000-3b16082000 ---p 00083000 09:01 1202422 /lib64/libm-2.12.so
3b16082000-3b16083000 r--p 00082000 09:01 1202422 /lib64/libm-2.12.so
3b16083000-3b16084000 rw-p 00083000 09:01 1202422 /lib64/libm-2.12.so
3b16200000-3b1638a000 r-xp 00000000 09:01 1202417 /lib64/libc-2.12.so
3b1638a000-3b1658a000 ---p 0018a000 09:01 1202417 /lib64/libc-2.12.so
3b1658a000-3b1658e000 r--p 0018a000 09:01 1202417 /lib64/libc-2.12.so
3b1658e000-3b1658f000 rw-p 0018e000 09:01 1202417 /lib64/libc-2.12.so
3b1658f000-3b16594000 rw-p 00000000 00:00 0
3b16600000-3b16602000 r-xp 00000000 09:01 1202424 /lib64/libdl-2.12.so
3b16602000-3b16802000 ---p 00002000 09:01 1202424 /lib64/libdl-2.12.so
3b16802000-3b16803000 r--p 00002000 09:01 1202424 /lib64/libdl-2.12.so
3b16803000-3b16804000 rw-p 00003000 09:01 1202424 /lib64/libdl-2.12.so
3b16a00000-3b16a17000 r-xp 00000000 09:01 1202420 /lib64/libpthread-2.12.so
3b16a17000-3b16c17000 ---p 00017000 09:01 1202420 /lib64/libpthread-2.12.so
3b16c17000-3b16c18000 r--p 00017000 09:01 1202420 /lib64/libpthread-2.12.so
3b16c18000-3b16c19000 rw-p 00018000 09:01 1202420 /lib64/libpthread-2.12.so
3b16c19000-3b16c1d000 rw-p 00000000 00:00 0
3b16e00000-3b16e07000 r-xp 00000000 09:01 1202421 /lib64/librt-2.12.so
3b16e07000-3b17006000 ---p 00007000 09:01 1202421 /lib64/librt-2.12.so
3b17006000-3b17007000 r--p 00006000 09:01 1202421 /lib64/librt-2.12.so
3b17007000-3b17008000 rw-p 00007000 09:01 1202421 /lib64/librt-2.12.so
7f3bf5575000-7f3bf557a000 rw-p 00000000 00:00 0
7f3bf557a000-7f3bf5590000 r-xp 00000000 00:1d 3658082613 /usr/prog/GCCcore/6.4.0/lib64/libgcc_s.so.1
7f3bf5590000-7f3bf5591000 r--p 00015000 00:1d 3658082613 /usr/prog/GCCcore/6.4.0/lib64/libgcc_s.so.1
7f3bf5591000-7f3bf5592000 rw-p 00016000 00:1d 3658082613 /usr/prog/GCCcore/6.4.0/lib64/libgcc_s.so.1
7f3bf5592000-7f3bf5593000 rw-p 00000000 00:00 0
7f3bf5593000-7f3bf55bf000 r-xp 00000000 00:1d 3678701844 /usr/prog/GCCcore/6.4.0/lib64/libgomp.so.1.0.0
7f3bf55bf000-7f3bf55c0000 r--p 0002b000 00:1d 3678701844 /usr/prog/GCCcore/6.4.0/lib64/libgomp.so.1.0.0
7f3bf55c0000-7f3bf55c1000 rw-p 0002c000 00:1d 3678701844 /usr/prog/GCCcore/6.4.0/lib64/libgomp.so.1.0.0
7f3bf55c1000-7f3bf573c000 r-xp 00000000 00:1d 3678976074 /usr/prog/GCCcore/6.4.0/lib64/libstdc++.so.6.0.22
7f3bf573c000-7f3bf5746000 r--p 0017a000 00:1d 3678976074 /usr/prog/GCCcore/6.4.0/lib64/libstdc++.so.6.0.22
7f3bf5746000-7f3bf574a000 rw-p 00184000 00:1d 3678976074 /usr/prog/GCCcore/6.4.0/lib64/libstdc++.so.6.0.22
7f3bf574a000-7f3bf574e000 rw-p 00000000 00:00 0
7f3bf5773000-7f3bf5776000 rw-p 00000000 00:00 0
7ffe2363b000-7ffe23661000 rw-p 00000000 00:00 0 [stack]
7ffe2370b000-7ffe2370c000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
run_FreePSI.sh: line 53: 115021 Aborted ${FreePSI}/freePSI quant -k $K -p ${THREAD} -i ./hashtable.json -o .
And here's the actual traceback when compiled with debugging symbols and run under gdb:
bash-4.1$ gdb freePSI
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-83.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/shirlma2/.local/bin/freePSI...done.
(gdb) run quant -k 27 -p 1 -i hashtable.json -o .
Starting program: /home/shirlma2/.local/bin/freePSI quant -k 27 -p 1 -i hashtable.json -o .
[Thread debugging using libthread_db enabled]
warning: File "/usr/prog/GCCcore/6.4.0/lib64/libstdc++.so.6.0.22-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "/usr/share/gdb/auto-load:/usr/lib/debug:/usr/bin/mono-gdb.py".
To enable execution of this file add
add-auto-load-safe-path /usr/prog/GCCcore/6.4.0/lib64/libstdc++.so.6.0.22-gdb.py
line to your configuration file "/home/shirlma2/.gdbinit".
To completely disable this security protection add
set auto-load safe-path /
line to your configuration file "/home/shirlma2/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual. E.g., run from the shell:
info "(gdb)Auto-loading safe path"
*** Eigen mode:
--- Use intrinsic parallel in Eigen
--- Not use Intel MKL in Eigen
Elasped time 0s.
### Start to initialize parameters ...
*** Start to initialize variable indices ...
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> OK!
*** Finish initializing variable indices!
Elasped time 0s.
*** Start to initialize coefficients ...
+++ Start to initialize effective length ...
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> OK!
+++ Finish initializing effective length!
+++ Start to initialize contribution matrix and kmer count vector ...
> OK!
+++ Finish initializing contribution matrix and kmer count vector!
*** Finish initializing coefficients!
Elasped time 0s.
*** Start to set initial values for variables (Gamma, Theta and Mu)...
+++ Start to allocate kmer count as initial value ...
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> OK!
+++ Finish allocating kmer count as initial value ...
=== On average, 5.1416 exons or juctions share one kmer.
=== 2.01483% kmers are unique to exons in large exon number gene.
=== 38.6076% elements of Z (316 in total) are zero.
=== 63.3393% elements of X (26437 in total) are zero.
=== 1171 elements of X (2398 in total) are zero.
=== 2.21519% genes (7 in total) contain more than 40 exons
+++ Start to do normalization ...
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> OK!
+++ Finish doing normalization!
*** Finish setting initial values for variables (Gamma, Alpha and Mu)!
Elasped time 0s.
*** Start to initialize constraint matrix ...
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>OK!
*** Finish initializing constraint matrix!
Elasped time 0s.
*** Start to make Alpha feasible...
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>OK!
+++ Finish making Alpha feasible!
=== Prune 21.6179% infeaible variables (infeasible junctions) ...
=== 0 new empty genes are found
=== Now there are 7926 variables. (Before 10112)
*** Finish initializing CGPA optimizer and making Alpha feasible...
Elasped time 0s.
### Finish initializing parameters !
Elasped time 0s.
### Start to estimate parameters ...
*** Start to run EM algorithm ...
*** M-step mode:
--- Dont run M-step concurrently
OK!
=== Initial log-likelihood is -6.56896e+07
+++ 1 iteration processed...
+++ E-step ...
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> OK!
Elasped time 0s.
Thread number = 1
+++ M-step ...
Eigen threads = 12
Thread num = 1
Program received signal SIGSEGV, Segmentation fault.
) at ../../eigen-git-mirror-3.3.0/Eigen/src/SparseCore/SparseDenseProduct.h:102
102 res.coeffRef(it.index(),c) += it.value() * rhs_j;
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.166.el6_7.1.x86_64
(gdb) backtrace
#0 ConjugateGradientProjection::dropLastHyperplane() (warning: (Internal error: pc 0x42c6dc in read in psymtab, but not in symtab.)
) at ../../eigen-git-mirror-3.3.0/Eigen/src/SparseCore/SparseDenseProduct.h:102
#1 0x0000000000432ba9 in ConjugateGradientProjection::optimizeQ(double, int) (warning: (Internal error: pc 0x432ba8 in read in psymtab, but not in symtab.)
) at ConjugateGradientProjection.cpp:114
#2 0x0000000000411a25 in EMAlgorithm::mStep(double, int) (warning: (Internal error: pc 0x411a24 in read in psymtab, but not in symtab.)
) at EMAlgorithm.cpp:595
#3 0x000000000041596a in EMAlgorithm::work(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) (warning: (Internal error: pc 0x415969 in read in psymtab, but not in symtab.)
) at EMAlgorithm.cpp:717
#4 0x00000000004094fa in quant(int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) () at freePSI.cpp:139
#5 0x000000000040a0f3 in loadQuant(int, char**) () at freePSI.cpp:319
#6 0x0000000000403cca in main () at freePSI.cpp:339
It looks like this might be an issue with the version of eigen I'm using. I also get segfaults in different places in ConjugateGradientProjection.cpp depending on whether I use -D EIGEN_DONT_PARALLELIZE -D EIGEN_USE_MKL_ALL
or not. In the above case I'm telling Eigen not to use MKL. I thought originally this might have been a simple double free() issue but on closer inspection seems more complicated. Maybe distributing a statically linked binary for #1 will help if this really is a library incompatibility issue.
Here's an archive of the example data (jellyfish outputs and hash table generated by running freePSI build
), along with the binary I compiled (likely won't work on your machine).
$jellyfish --version
jellyfish 2.2.7
$freePSI --version
FreePSI v0.3
Please use following commands to see help information.
freePSI build -h
freePSI quant -h
run_freePSI.sh:
#!/bin/bash
set -e
# Provide the directory containing Jellyfish (usually named as 'bin')
Jellyfish=$HOME/.local/bin
# Provide the directory containing FreePSI
# E.g.
#FreePSI=../bin
FreePSI=$HOME/.local/bin
if [ -z ${Jellyfish} ]; then
echo "Error: Please modify me to provide the directory containing Jellyfish"
exit;
fi
if [ -z ${FreePSI} ]; then
echo "Error: Please modify me to provide the directory containing FreePSI"
exit;
fi
GENOME_DIR=./genome
BND_FILE=./annotation/hg38_refGene_exonBoundary_chr21.bed
READS=RNA-seq
K=27
THREAD=1
set -x
# Count k-mers in RNA-seq reads using jellyfish
${Jellyfish}/jellyfish count -m ${K} -s 100M -t ${THREAD} -Q 5 ${READS}/reads_final.1.fastq -o ${READS}/reads.1.jf
${Jellyfish}/jellyfish dump ${READS}/reads.1.jf -o ${READS}/reads.1.fa
${Jellyfish}/jellyfish count -m ${K} -s 100M -t ${THREAD} -Q 5 ${READS}/reads_final.2.fastq -o ${READS}/reads.2.jf
${Jellyfish}/jellyfish dump ${READS}/reads.2.jf -o ${READS}/reads.2.fa
# Produce raw estimates of PSI values using FreePSI
#Build
${FreePSI}/freePSI build\
-k $K -p ${THREAD} \
-g ${GENOME_DIR} \
-a ${BND_FILE} \
-1 ${READS}/reads.1.fa \
-2 ${READS}/reads.2.fa \
-o ./hashtable.json
#Quant
${FreePSI}/freePSI quant\
-k $K -p ${THREAD} \
-i ./hashtable.json \
-o .
# Post-process the raw estimates of PSI values
python3 ${FreePSI}/postProc.py \
./psi_freePSI_raw.json \
./psi_freePSI.json
# Summarize the PSI values into a readable file
python3 ${FreePSI}/summary.py \
./annotation/hg38_refGene_exonBoundary_chr21.bed \
./psi_freePSI.json \
./psi_freePSI.summary
I'm going to close this as I was able to produce a working binary using the authors' specific library versions in PR #3.
I've been trying to run freePSI on my own data, and receive a segmentation fault. Here's the traceback when run on the example data: