YangLab / CIRCexplorer

A combined strategy to identify circular RNAs (circRNAs and ciRNAs) (Zhang et al., Complementary Sequence-Mediated Exon Circularization, Cell (2014), 159:134-147)
http://yanglab.github.io/CIRCexplorer
Other
60 stars 23 forks source link

Issue running CIRCexplorer #7

Closed madmblac closed 9 years ago

madmblac commented 9 years ago

I am trying to use CIRCexplorer with STAR mapper and the error "IndexError: list index out of range" keeps occurring. I can't figure out how to remedy this problem, any help would be appreciated. Here is my command line: python CIRCexplorer.py -j 0002-surg-funsion-junction.txt --genome=human.hg19.genome --ref=UCSC_Refseq_sno_miRNA_lncipedia_3_0_hg19_11-10-2014.gtf

Start CIRCexplorer 1.1.1 Start to annotate fusion junctions... Traceback (most recent call last): File "CIRCexplorer.py", line 494, in annotate_fusion(ref_f, temp1, temp2) File "CIRCexplorer.py", line 67, in annotate_fusion genes, gene_info = parse_ref1(ref_f) # gene annotations File "CIRCexplorer.py", line 201, in parse_ref1 start = starts[0] IndexError: list index out of range

kepbod commented 9 years ago

The --ref option does not accept GTF format, so please convert the GTF file into GenePred format (https://github.com/YangLab/CIRCexplorer#note).

madmblac commented 9 years ago

I converted the -ref option to GenePref format and I am still getting the an IndexError.

[madmblac@h1 PythonModules]$ python CIRCexplorer.py -j 0002-surg-funsion-junction.txt --genome=human.hg19.genome --ref=RefSeq.txt Start CIRCexplorer 1.1.1 Start to annotate fusion junctions... Traceback (most recent call last): File "CIRCexplorer.py", line 494, in annotate_fusion(ref_f, temp1, temp2) File "CIRCexplorer.py", line 67, in annotate_fusion genes, gene_info = parse_ref1(ref_f) # gene annotations File "CIRCexplorer.py", line 200, in parse_ref1 ends = [int(x) for x in line.split()[10].split(',')[:-1]] IndexError: list index out of range Do you by chance know what I am doing wrong to continue to get this error?

On Thu, Jun 18, 2015 at 1:26 AM, Xiao-Ou Zhang notifications@github.com wrote:

The --ref option does not accept GTF format, so please convert the GTF file into GenePred format (https://github.com/YangLab/CIRCexplorer#note).

— Reply to this email directly or view it on GitHub https://github.com/YangLab/CIRCexplorer/issues/7#issuecomment-113041516.

kepbod commented 9 years ago

I think there are still some format errors in your RefSeq.txt. Please make sure the format of your RefSeq.txt is exactly the same with the format of the ref example file. Or you could paste some lines of your RefSeq.txt here for me to help you to figure out why you got those errors.

madmblac commented 9 years ago

NM_032291 chr1 + 66999824 67210768 67000041 67208778 25 66999824,67091529,67098752,67101626,67105459,67108492,67109226,67126195,67133212,67136677,67137626,67138963,67142686,67145360,67147551,67154830,67155872,67161116,67184976,67194946,67199430,67205017,67206340,67206954,67208755, 67000051,67091593,67098777,67101698,67105516,67108547,67109402,67126207,67133224,67136702,67137678,67139049,67142779,67145435,67148052,67154958,67155999,67161176,67185088,67195102,67199563,67205220,67206405,67207119,67210768,

NM_001301823 chr1 + 33546729 33586132 33557656 33585783 9 33546729,33549554,33557650,33558882,33560148,33562307,33563667,33583502,33585644, 33547109,33549728,33557823,33559017,33560314,33562470,33563780,33583717,33586132,

NM_013943 chr1 + 25071759 25170815 25072044 25167428 6 25071759,25124232,25140584,25153500,25166350,25167263, 25072116,25124342,25140710,25153607,25166532,25170815,

NM_032785 chr1 - 48998526 50489626 48999844 50489468 14 48998526,49000561,49005313,49052675,49056504,49100164,49119008,49128823,49332862,49511255,49711441,50162984,50317067,50489434, 48999965,49000588,49005410,49052838,49056657,49100276,49119123,49128913,49332902,49511472,49711536,50163109,50317190,50489626,

NM_052998 chr1 + 33546713 33586132 33547850 33585783 12 33546713,33546988,33547201,33547778,33549554,33557650,33558882,33560148,33562307,33563667,33583502,33585644, 33546895,33547109,33547413,33547955,33549728,33557823,33559017,33560314,33562470,33563780,33583717,33586132,

NM_001145277 chr1 + 16767166 16786584 16767256 16785491 7 16767166,16770126,16774364,16774554,16775587,16778332,16785336, 16767348,16770227,16774469,16774636,16775696,16778510,16786584, NR_126031 chr1 + 33547778 33567493 33567493 33567493 8 33547778,33549554,33557650,33558845,33560148,33562307,33563667,33567433, 33547955,33549728,33557823,33559017,33560314,33562470,33563780,33567493,

Here are the first seven lines of my RefSeq.txt

On Thu, Jun 18, 2015 at 10:46 AM, Xiao-Ou Zhang notifications@github.com wrote:

I think there are still some format errors in your RefSeq.txt. Please make sure the format of your RefSeq.txt is exactly the same with the format of the ref example file https://github.com/YangLab/CIRCexplorer/blob/master/example/ref_example.txt. Or you could paste some lines of your RefSeq.txt here for me to help you find why you got those errors.

— Reply to this email directly or view it on GitHub https://github.com/YangLab/CIRCexplorer/issues/7#issuecomment-113180968.

adomingues commented 9 years ago

If I may chip in, I recently had some issues using genePred annotations created with gtfToGenePred (for another program, not CIRCexplorer). The reason is that the conversion generates only 10 fields whereas genePred tables from UCSC have 11. The field missing, is the 1st one, bin. I solved the problem by adding a mock column to the file with sed -i 's/^/\.\t/' ensGene.txt. This workaround is usually not needed for tables downloaded from UCSC.

kepbod commented 9 years ago

@madmblac My ref file have added one column before the isoform id column (the first column of your ref file) to indicate the gene symbol (See the example file or the format note). I think you have noticed this difference. The gtfToGenePred script from UCSC utilities is practicable to convert GTF to GenePred. What you need to do is to add one column. If you don't want to add gene symbols, you could simply duplicate the relevant isoform id before the first column using some scripts like perl -alne '$"="\t";print "$F[0]\t@F"' RefSeq.txt.

madmblac commented 9 years ago

I'll try that, thank you for suggestion!

On Thu, Jun 18, 2015 at 11:08 AM, Xiao-Ou Zhang notifications@github.com wrote:

My ref file have added one column before the isoform id column (the first column of your ref file) to indicate the gene symbol (See the example file https://github.com/YangLab/CIRCexplorer/blob/master/example/ref_example.txt or the format note https://github.com/YangLab/CIRCexplorer#note). I think you have noticed this difference. The gtfToGenePred script from UCSC utilities is practicable to convert GTF to GenePred. What you need to do is to add one column. If you don't want to add gene symbols, you could simply duplicate the relevant isoform id before the first column using some scripts like perl -alne '$"="\t";print "$F[0]\t@F"' RefSeq.txt.

— Reply to this email directly or view it on GitHub https://github.com/YangLab/CIRCexplorer/issues/7#issuecomment-113187318.

kepbod commented 9 years ago

@madmblac and @adomingues Given that many people have problems with the format of ref file, I have uploaded a new script fetch_ucsc.py which could help people to automatically download the gene annotation file (known genes, refseq or ensembl) with suitable format. Please try them and give me some feedbacks. Thanks!

kepbod commented 9 years ago

@adomingues GenePred format has many types, and you could find the explanation from UCSC here. The format used by CIRCexplorer is Gene Predictions and RefSeq Genes with Gene Names. Other GenePred formats may import unknown issues.

madmblac commented 9 years ago

New script works great for downloading gene annotation file, thanks for adding that!

On Thu, Jun 18, 2015 at 11:45 AM, Xiao-Ou Zhang notifications@github.com wrote:

@adomingues https://github.com/adomingues GenePred format has many types, and you could find the explanation from UCSC here http://genome.ucsc.edu/FAQ/FAQformat.html#format9. The format used by CIRCexplorer is Gene Predictions and RefSeq Genes with Gene Names. Other GenePred formats may import unknown issues.

— Reply to this email directly or view it on GitHub https://github.com/YangLab/CIRCexplorer/issues/7#issuecomment-113197740.

syxbestmayer commented 8 years ago

I have the error like this: Start CIRCexplorer 1.1.3 Start to convert fustion reads... Converted 232608 fusion reads! Start to annotate fusion junctions... Traceback (most recent call last): File "/usr/local/bin/CIRCexplorer.py", line 491, in annotate_fusion(ref_f, temp1, temp2) File "/usr/local/bin/CIRCexplorer.py", line 67, in annotate_fusion genes, gene_info = parse_ref1(ref_f) # gene annotations File "/usr/local/bin/CIRCexplorer.py", line 203, in parse_ref1 genes[chrom] = Interval(genes[chrom]) File "/usr/local/lib/python2.7/site-packages/interval.py", line 279, in init raise TypeError("lower_bound is not hashable.") TypeError: lower_bound is not hashable.

my command line: CIRCexplorer.py -f tophat_fusion/accepted_hits.bam -g /data/share/xsy/hg19.index/hg19.fa -ref /home/xsy/CIRCexplorer-1.1.3/ref.txt

kepbod commented 8 years ago

@syxbestmayer Please refer to issue #4!