Closed olgabot closed 7 years ago
Description Tested outrigger index in a directory with 18 SJ.out.tab files and got the same error in addition to some dtype warnings. Carried out locally on a Ubuntu 14.04 system with 32 GB of RAM.
Version $ outrigger --version outrigger 1.0.0
Terminal output
(outrigger-env) /.../star_sjout$ outrigger index --sj-out-tab *SJ.out.tab --gtf /.../Mus_musculus.GRCm38.84.gtf
2017-04-13 12:32:41 Creating folder ./outrigger_output ...
2017-04-13 12:32:41 Done.
2017-04-13 12:32:41 Creating folder ./outrigger_output/index ...
2017-04-13 12:32:41 Done.
2017-04-13 12:32:41 Creating folder ./outrigger_output/index/gtf ...
2017-04-13 12:32:41 Done.
2017-04-13 12:32:41 Creating folder ./outrigger_output/junctions ...
2017-04-13 12:32:41 Done.
2017-04-13 12:32:41 Reading SJ.out.files and creating a big splice junction table of reads spanning exon-exon junctions...
/home/hnasko-lab/anaconda2/envs/outrigger-env/lib/python3.5/site-packages/joblib/parallel.py:131: DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory=False.
return [func(*args, **kwargs) for func, args, kwargs in self.items]
/home/hnasko-lab/anaconda2/envs/outrigger-env/lib/python3.5/site-packages/joblib/parallel.py:131: DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory=False.
return [func(*args, **kwargs) for func, args, kwargs in self.items]
/home/hnasko-lab/anaconda2/envs/outrigger-env/lib/python3.5/site-packages/joblib/parallel.py:131: DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory=False.
return [func(*args, **kwargs) for func, args, kwargs in self.items]
/home/hnasko-lab/anaconda2/envs/outrigger-env/lib/python3.5/site-packages/joblib/parallel.py:131: DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory=False.
return [func(*args, **kwargs) for func, args, kwargs in self.items]
/home/hnasko-lab/anaconda2/envs/outrigger-env/lib/python3.5/site-packages/joblib/parallel.py:131: DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory=False.
return [func(*args, **kwargs) for func, args, kwargs in self.items]
/home/hnasko-lab/anaconda2/envs/outrigger-env/lib/python3.5/site-packages/joblib/parallel.py:131: DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory=False.
return [func(*args, **kwargs) for func, args, kwargs in self.items]
/home/hnasko-lab/anaconda2/envs/outrigger-env/lib/python3.5/site-packages/joblib/parallel.py:131: DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory=False.
return [func(*args, **kwargs) for func, args, kwargs in self.items]
/home/hnasko-lab/anaconda2/envs/outrigger-env/lib/python3.5/site-packages/joblib/parallel.py:131: DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory=False.
return [func(*args, **kwargs) for func, args, kwargs in self.items]
2017-04-13 12:33:19 Writing ./outrigger_output/junctions/reads.csv ...
2017-04-13 12:33:37 Done.
2017-04-13 12:33:37 Filtering for only junctions with minimum 10 reads ...
2017-04-13 12:33:44 134088/451740 junctions remain after filtering out 317652 junctions with < 10 reads.
2017-04-13 12:33:44 Done.
2017-04-13 12:33:44 Creating splice junction metadata of merely where junctions start and stop
2017-04-13 12:33:45 Done.
2017-04-13 12:33:45 Writing metadata of junctions to ./outrigger_output/junctions/metadata.csv ...
2017-04-13 12:33:46 Done.
2017-04-13 12:33:46 Found GTF file in /home/hnasko-lab/Documents/genomes/Mus_musculus.GRCm38.84.gtf
2017-04-13 12:33:46 Creating a "gffutils" database ./outrigger_output/index/gtf/Mus_musculus.GRCm38.84.gtf.db ...
2017-04-13 12:42:11,733 - INFO - Committing changes: 1589000 features
INFO:gffutils.create:Committing changes
2017-04-13 12:42:20,565 - INFO - Populating features table and first-order relations: 1589641 features
INFO:gffutils.create:Populating features table and first-order relations: 1589641 features
2017-04-13 12:42:20,566 - INFO - Creating relations(parent) index
INFO:gffutils.create:Creating relations(parent) index
2017-04-13 12:42:22,698 - INFO - Creating relations(child) index
INFO:gffutils.create:Creating relations(child) index
2017-04-13 12:42:25,457 - INFO - Creating features(featuretype) index
INFO:gffutils.create:Creating features(featuretype) index
2017-04-13 12:42:26 Done.
2017-04-13 12:42:26 Looking up which exons are already defined ...
2017-04-13 12:42:27 Done.
2017-04-13 12:42:27 Detecting de novo exons based on gaps between junctions ...
2017-04-13 12:42:27 Finding all exons on chromosome 1 ...
2017-04-13 12:43:22 Done.
2017-04-13 12:43:22 Filtering for only novel exons on chromosome 1 ...
2017-04-13 12:43:22 Done.
2017-04-13 12:43:22 Creating gffutils.Feature objects for each novel exon, plus potentially its overlapping gene
2017-04-13 12:43:25 Done.
2017-04-13 12:43:25 Updating gffutils database with 57 novel exons on chromosome 1 ...
2017-04-13 12:44:205%) Done.
2017-04-13 12:44:20 Finding all exons on chromosome 10 ...
2017-04-13 12:44:59 Done.
2017-04-13 12:44:59 Filtering for only novel exons on chromosome 10 ...
2017-04-13 12:44:59 Done.
2017-04-13 12:44:59 Creating gffutils.Feature objects for each novel exon, plus potentially its overlapping gene
2017-04-13 12:45:09 Done.
2017-04-13 12:45:09 Updating gffutils database with 86 novel exons on chromosome 10 ...
2017-04-13 15:01:105%) Done.
2017-04-13 15:01:10 Finding all exons on chromosome 11 ...
2017-04-13 15:04:05 Done.
2017-04-13 15:04:05 Filtering for only novel exons on chromosome 11 ...
2017-04-13 15:04:05 Done.
2017-04-13 15:04:05 Creating gffutils.Feature objects for each novel exon, plus potentially its overlapping gene
2017-04-13 15:04:20 Done.
2017-04-13 15:04:20 Updating gffutils database with 130 novel exons on chromosome 11 ...
Traceback (most recent call last):
File "/home/hnasko-lab/anaconda2/envs/outrigger-env/lib/python3.5/site-packages/gffutils/create.py", line 981, in _update_relations
self._insert(f, c)
File "/home/hnasko-lab/anaconda2/envs/outrigger-env/lib/python3.5/site-packages/gffutils/create.py", line 510, in _insert
cursor.execute(constants._INSERT, feature.astuple())
sqlite3.IntegrityError: UNIQUE constraint failed: features.id
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/hnasko-lab/anaconda2/envs/outrigger-env/bin/outrigger", line 11, in <module>
sys.exit(main())
File "/home/hnasko-lab/anaconda2/envs/outrigger-env/lib/python3.5/site-packages/outrigger/commandline.py", line 980, in main
cl = CommandLine(sys.argv[1:])
File "/home/hnasko-lab/anaconda2/envs/outrigger-env/lib/python3.5/site-packages/outrigger/commandline.py", line 307, in __init__
self.args.func()
File "/home/hnasko-lab/anaconda2/envs/outrigger-env/lib/python3.5/site-packages/outrigger/commandline.py", line 311, in index
index.execute()
File "/home/hnasko-lab/anaconda2/envs/outrigger-env/lib/python3.5/site-packages/outrigger/commandline.py", line 705, in execute
metadata, db)
File "/home/hnasko-lab/anaconda2/envs/outrigger-env/lib/python3.5/site-packages/outrigger/commandline.py", line 576, in make_exon_junction_adjacencies
exon_junction_adjacencies.detect_exons_from_junctions()
File "/home/hnasko-lab/anaconda2/envs/outrigger-env/lib/python3.5/site-packages/outrigger/index/adjacencies.py", line 227, in detect_exons_from_junctions
transform=transform)
File "/home/hnasko-lab/anaconda2/envs/outrigger-env/lib/python3.5/site-packages/gffutils/interface.py", line 827, in update
db._update_relations()
File "/home/hnasko-lab/anaconda2/envs/outrigger-env/lib/python3.5/site-packages/gffutils/create.py", line 983, in _update_relations
fixed, final_strategy = self._do_merge(f, 'merge')
File "/home/hnasko-lab/anaconda2/envs/outrigger-env/lib/python3.5/site-packages/gffutils/create.py", line 288, in _do_merge
self._add_duplicate(orig_id, uniqued_feature.id)
File "/home/hnasko-lab/anaconda2/envs/outrigger-env/lib/python3.5/site-packages/gffutils/create.py", line 360, in _add_duplicate
(idspecid, newid))
sqlite3.IntegrityError: UNIQUE constraint failed: duplicates.newid
Here's more information on the commands and output:
https://gist.github.com/olgabot/f51b795b62c71f2b2cdb8cd586bdaef4
I'm working on a fix and we'll see if it will work. Otherwise, I think this will be fixed by revamping the command line inputs to be more explicit (https://github.com/YeoLab/outrigger/issues/78) and avoid clashing between databases
Description
There is an error that occurs when adding novel exons to the
gffutils.FeatureDB
duringoutrigger index
.Steps to Reproduce
On the branch
v1.0.0rc1
, there are additional SJ.out.tab files for testing. Using these files, there's an error when finding novel exons on chromosome 4:Expected behavior: Expected
outrigger index
to complete without errorActual behavior: Got this error
Versions