YeoLab / outrigger

Create a *de novo* alternative splicing database, validate splicing events, and quantify percent spliced-in (Psi) from RNA seq data
http://yeolab.github.io/outrigger/
BSD 3-Clause "New" or "Revised" License
61 stars 22 forks source link

outrigger index looping forever while updating gffutils database with novel exons on chr10 (mm10) #84

Open alaindomissy opened 7 years ago

alaindomissy commented 7 years ago

Description

outrigger index loops forever when processing the following: processing a set of STAR output SJ.out.tab.files for an experiment on mm10 step "Updating gffutils database with 1238 novel exons on chromosome chr10 ..." never completes

Steps to Reproduce

1) Download the 8 SJout files for the muscle tissue from this experiment: A circadian gene expression atlas in mammals assayed by RNA-seq https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE54651

wget https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSM1321358&format=file&file=GSM1321358%5FMus%5FCT22%5FSJ%2Eout%2Etab%2Egz
wget https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSM1321359&format=file&file=GSM1321359%5FMus%5FCT28%5FSJ%2Eout%2Etab%2Egz
wget https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSM1321359&format=file&file=GSM1321360%5FMus%5FCT28%5FSJ%2Eout%2Etab%2Egz
wget https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSM1321359&format=file&file=GSM1321361%5FMus%5FCT28%5FSJ%2Eout%2Etab%2Egz
wget https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSM1321359&format=file&file=GSM1321362%5FMus%5FCT28%5FSJ%2Eout%2Etab%2Egz
wget https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSM1321359&format=file&file=GSM1321363%5FMus%5FCT28%5FSJ%2Eout%2Etab%2Egz
wget https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSM1321359&format=file&file=GSM1321364%5FMus%5FCT28%5FSJ%2Eout%2Etab%2Egz
wget https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSM1321359&format=file&file=GSM1321365%5FMus%5FCT28%5FSJ%2Eout%2Etab%2Egz

2) decompress the downloaded archives

tar xzvf *_SJ.out.tab.gz

3) run the following script

$ cat outrigger_index.sh
outrigger index \
    --sj-out-tab *SJ.out.tab \
    --gtf /projects/ps-yeolab/genomes/mm10/gencode/m10/gencode.vM10.annotation.gtf
 $ ./outrigger_index.sh

Expected behavior:

Expecting outrigger index to proceed successfully and produce the junction index database

Actual behavior: step "Updating gffutils database with 1238 novel exons on chromosome chr10 ..." never completes Getting this outputs

2017-06-05 08:53:45     Creating folder ./outrigger_output ...
2017-06-05 08:53:45             Done.
2017-06-05 08:53:45     Creating folder ./outrigger_output/index ...
2017-06-05 08:53:45             Done.
2017-06-05 08:53:45     Creating folder ./outrigger_output/index/gtf ...
2017-06-05 08:53:45             Done.
2017-06-05 08:53:45     Creating folder ./outrigger_output/junctions ...
2017-06-05 08:53:45             Done.
2017-06-05 08:53:45     Reading SJ.out.files and creating a big splice junction table of reads spanning exon-exon junctions...
2017-06-05 08:53:50     Writing ./outrigger_output/junctions/reads.csv ...

2017-06-05 08:54:09             Done.
2017-06-05 08:54:09     Filtering for only junctions with minimum 10 reads ...
2017-06-05 08:54:21             91830/276507 junctions remain after filtering out 184677 junctions with < 10 reads.
2017-06-05 08:54:21             Done.
2017-06-05 08:54:21     Creating splice junction metadata of merely where junctions start and stop
2017-06-05 08:54:21             Done.
2017-06-05 08:54:21     Writing metadata of junctions to ./outrigger_output/junctions/metadata.csv ...
2017-06-05 08:54:22     Found GTF file in /projects/ps-yeolab/genomes/mm10/gencode/m10/gencode.vM10.annotation.gtf
2017-06-05 08:54:22     Creating a "gffutils" database ./outrigger_output/index/gtf/gencode.vM10.annotation.gtf.db ...
2017-06-05 09:16:59,770 - INFO - Committing changes: 1616000 features
INFO:gffutils.create:Committing changes
2017-06-05 09:17:11,130 - INFO - Populating features table and first-order relations: 1616635 features
INFO:gffutils.create:Populating features table and first-order relations: 1616635 features
2017-06-05 09:17:11,144 - INFO - Creating relations(parent) index
INFO:gffutils.create:Creating relations(parent) index
2017-06-05 09:17:22,907 - INFO - Creating relations(child) index
INFO:gffutils.create:Creating relations(child) index
2017-06-05 09:17:32,367 - INFO - Creating features(featuretype) index
INFO:gffutils.create:Creating features(featuretype) index
2017-06-05 09:17:41             Done.
2017-06-05 09:17:41             Looking up which exons are already defined ...
2017-06-05 09:17:44                     Done.
2017-06-05 09:17:44     Detecting de novo exons based on gaps between junctions ...
2017-06-05 09:17:44             Finding all exons on chromosome chr1 ...
2017-06-05 09:22:08                     Done.
2017-06-05 09:22:08                     Filtering for only novel exons on chromosome chr1 ...
2017-06-05 09:22:08                             Done.
2017-06-05 09:22:08                     Creating gffutils.Feature objects for each novel exon, plus potentially its overlapping gene
2017-06-05 09:27:30                             Done.
2017-06-05 09:27:30                     Updating gffutils database with 1589 novel exons on chromosome chr1 ...
2017-06-05 09:32:575%)                          Done.
2017-06-05 09:32:57             Finding all exons on chromosome chr10 ...
2017-06-05 09:35:30                     Done.
2017-06-05 09:35:30                     Filtering for only novel exons on chromosome chr10 ...
2017-06-05 09:35:30                             Done.
2017-06-05 09:35:30                     Creating gffutils.Feature objects for each novel exon, plus potentially its overlapping gene
2017-06-05 09:42:32                             Done.
2017-06-05 09:42:32                     Updating gffutils database with 1238 novel exons on chromosome chr10 ...
0 of 470668 (0%)

Versions

$ outrigger --version
outrigger 1.0.0