anbrooks / juncBASE

Junction Based Analysis of Splicing Events for RNA-Seq
30 stars 17 forks source link

Get a problem when running with build_DB_FromGTF.py #1

Closed ChengZhao closed 7 years ago

ChengZhao commented 8 years ago

Hi, guys, I got a problem when running with build_DB_FromGTF.py I downloaded the UCSC.hg19.knownGene_w_gene_symbol_EnsemblChr_noGL.gtf file and followed the instruction "Ensembl annotations, use only GTF lines that have "exon" in the third column" . So I run with command . awk '{if ($3 == "exon") print $0}' UCSC.hg19.knownGene_w_gene_symbol_EnsemblChr_noGL.gtf > UCSC.hg19.knownGene_w_gene_symbol_EnsemblChr_noGL.exon.gtf python build_DB_FromGTF.py -g UCSC.hg19.knownGene_w_gene_symbol_EnsemblChr_noGL.exon.gtf -d test --sqlite_db_dir TEST --initialize (It works fine) . But when I run the :

python build_DB_FromGTF.py -g UCSC.hg19.knownGene_w_gene_symbol_EnsemblChr_noGL.exon.gtf -d test --sqlite_db_dir TEST It showed me an error “ Traceback (most recent call last): File "build_DB_FromGTF.py", line 837, in if name == "main": main() File "build_DB_FromGTF.py", line 200, in main buildAnnotDB(db_obj, chr2gtf_lines, db_name, use_gene_name) File "build_DB_FromGTF.py", line 271, in buildAnnotDB buildGeneTable(db, db_name, chr) File "build_DB_FromGTF.py", line 436, in buildGeneTable gene = row["gene_name"] TypeError: tuple indices must be integers, not str ” The python's version is Python 2.7.6 Can anyone do me a favour to give me a tips. Many thanks in advance.

anbrooks commented 8 years ago

Are you using an updated version of JuncBASE? Current release is now v1.2-beta. There was a change to pysqlite that was causing and error like this. It should have been fixed in the more recent files, but if not, please let me know.

Bioinfowangm commented 8 years ago

Hi, I used build_DB_FromGTF.py from the v1.2-beta version, and it took really long time making databases (either reference or de novo), I'm using Drosophila mel r6 gtf. I hope everything is OK.

anbrooks commented 8 years ago

build_DB_FromGTF.py does take a while. This only needs to be done once for each GTF reference.