Closed madmblac closed 9 years ago
The --ref option does not accept GTF format, so please convert the GTF file into GenePred format (https://github.com/YangLab/CIRCexplorer#note).
I converted the -ref option to GenePref format and I am still getting the an IndexError.
[madmblac@h1 PythonModules]$ python CIRCexplorer.py -j
0002-surg-funsion-junction.txt --genome=human.hg19.genome --ref=RefSeq.txt
Start CIRCexplorer 1.1.1
Start to annotate fusion junctions...
Traceback (most recent call last):
File "CIRCexplorer.py", line 494, in
On Thu, Jun 18, 2015 at 1:26 AM, Xiao-Ou Zhang notifications@github.com wrote:
The --ref option does not accept GTF format, so please convert the GTF file into GenePred format (https://github.com/YangLab/CIRCexplorer#note).
— Reply to this email directly or view it on GitHub https://github.com/YangLab/CIRCexplorer/issues/7#issuecomment-113041516.
I think there are still some format errors in your RefSeq.txt. Please make sure the format of your RefSeq.txt is exactly the same with the format of the ref example file. Or you could paste some lines of your RefSeq.txt here for me to help you to figure out why you got those errors.
NM_032291 chr1 + 66999824 67210768 67000041 67208778 25 66999824,67091529,67098752,67101626,67105459,67108492,67109226,67126195,67133212,67136677,67137626,67138963,67142686,67145360,67147551,67154830,67155872,67161116,67184976,67194946,67199430,67205017,67206340,67206954,67208755, 67000051,67091593,67098777,67101698,67105516,67108547,67109402,67126207,67133224,67136702,67137678,67139049,67142779,67145435,67148052,67154958,67155999,67161176,67185088,67195102,67199563,67205220,67206405,67207119,67210768,
NM_001301823 chr1 + 33546729 33586132 33557656 33585783 9 33546729,33549554,33557650,33558882,33560148,33562307,33563667,33583502,33585644, 33547109,33549728,33557823,33559017,33560314,33562470,33563780,33583717,33586132,
NM_013943 chr1 + 25071759 25170815 25072044 25167428 6 25071759,25124232,25140584,25153500,25166350,25167263, 25072116,25124342,25140710,25153607,25166532,25170815,
NM_032785 chr1 - 48998526 50489626 48999844 50489468 14 48998526,49000561,49005313,49052675,49056504,49100164,49119008,49128823,49332862,49511255,49711441,50162984,50317067,50489434, 48999965,49000588,49005410,49052838,49056657,49100276,49119123,49128913,49332902,49511472,49711536,50163109,50317190,50489626,
NM_052998 chr1 + 33546713 33586132 33547850 33585783 12 33546713,33546988,33547201,33547778,33549554,33557650,33558882,33560148,33562307,33563667,33583502,33585644, 33546895,33547109,33547413,33547955,33549728,33557823,33559017,33560314,33562470,33563780,33583717,33586132,
NM_001145277 chr1 + 16767166 16786584 16767256 16785491 7 16767166,16770126,16774364,16774554,16775587,16778332,16785336, 16767348,16770227,16774469,16774636,16775696,16778510,16786584, NR_126031 chr1 + 33547778 33567493 33567493 33567493 8 33547778,33549554,33557650,33558845,33560148,33562307,33563667,33567433, 33547955,33549728,33557823,33559017,33560314,33562470,33563780,33567493,
Here are the first seven lines of my RefSeq.txt
On Thu, Jun 18, 2015 at 10:46 AM, Xiao-Ou Zhang notifications@github.com wrote:
I think there are still some format errors in your RefSeq.txt. Please make sure the format of your RefSeq.txt is exactly the same with the format of the ref example file https://github.com/YangLab/CIRCexplorer/blob/master/example/ref_example.txt. Or you could paste some lines of your RefSeq.txt here for me to help you find why you got those errors.
— Reply to this email directly or view it on GitHub https://github.com/YangLab/CIRCexplorer/issues/7#issuecomment-113180968.
If I may chip in, I recently had some issues using genePred annotations created with gtfToGenePred
(for another program, not CIRCexplorer). The reason is that the conversion generates only 10 fields whereas genePred tables from UCSC have 11. The field missing, is the 1st one, bin. I solved the problem by adding a mock column to the file with sed -i 's/^/\.\t/' ensGene.txt
. This workaround is usually not needed for tables downloaded from UCSC.
@madmblac
My ref file have added one column before the isoform id column (the first column of your ref file) to indicate the gene symbol (See the example file or the format note). I think you have noticed this difference. The gtfToGenePred
script from UCSC utilities is practicable to convert GTF to GenePred. What you need to do is to add one column. If you don't want to add gene symbols, you could simply duplicate the relevant isoform id before the first column using some scripts like perl -alne '$"="\t";print "$F[0]\t@F"' RefSeq.txt
.
I'll try that, thank you for suggestion!
On Thu, Jun 18, 2015 at 11:08 AM, Xiao-Ou Zhang notifications@github.com wrote:
My ref file have added one column before the isoform id column (the first column of your ref file) to indicate the gene symbol (See the example file https://github.com/YangLab/CIRCexplorer/blob/master/example/ref_example.txt or the format note https://github.com/YangLab/CIRCexplorer#note). I think you have noticed this difference. The gtfToGenePred script from UCSC utilities is practicable to convert GTF to GenePred. What you need to do is to add one column. If you don't want to add gene symbols, you could simply duplicate the relevant isoform id before the first column using some scripts like perl -alne '$"="\t";print "$F[0]\t@F"' RefSeq.txt.
— Reply to this email directly or view it on GitHub https://github.com/YangLab/CIRCexplorer/issues/7#issuecomment-113187318.
@madmblac and @adomingues Given that many people have problems with the format of ref file, I have uploaded a new script fetch_ucsc.py which could help people to automatically download the gene annotation file (known genes, refseq or ensembl) with suitable format. Please try them and give me some feedbacks. Thanks!
@adomingues GenePred format has many types, and you could find the explanation from UCSC here. The format used by CIRCexplorer is Gene Predictions and RefSeq Genes with Gene Names. Other GenePred formats may import unknown issues.
New script works great for downloading gene annotation file, thanks for adding that!
On Thu, Jun 18, 2015 at 11:45 AM, Xiao-Ou Zhang notifications@github.com wrote:
@adomingues https://github.com/adomingues GenePred format has many types, and you could find the explanation from UCSC here http://genome.ucsc.edu/FAQ/FAQformat.html#format9. The format used by CIRCexplorer is Gene Predictions and RefSeq Genes with Gene Names. Other GenePred formats may import unknown issues.
— Reply to this email directly or view it on GitHub https://github.com/YangLab/CIRCexplorer/issues/7#issuecomment-113197740.
I have the error like this:
Start CIRCexplorer 1.1.3
Start to convert fustion reads...
Converted 232608 fusion reads!
Start to annotate fusion junctions...
Traceback (most recent call last):
File "/usr/local/bin/CIRCexplorer.py", line 491, in
my command line: CIRCexplorer.py -f tophat_fusion/accepted_hits.bam -g /data/share/xsy/hg19.index/hg19.fa -ref /home/xsy/CIRCexplorer-1.1.3/ref.txt
@syxbestmayer Please refer to issue #4!
I am trying to use CIRCexplorer with STAR mapper and the error "IndexError: list index out of range" keeps occurring. I can't figure out how to remedy this problem, any help would be appreciated. Here is my command line: python CIRCexplorer.py -j 0002-surg-funsion-junction.txt --genome=human.hg19.genome --ref=UCSC_Refseq_sno_miRNA_lncipedia_3_0_hg19_11-10-2014.gtf
Start CIRCexplorer 1.1.1 Start to annotate fusion junctions... Traceback (most recent call last): File "CIRCexplorer.py", line 494, in
annotate_fusion(ref_f, temp1, temp2)
File "CIRCexplorer.py", line 67, in annotate_fusion
genes, gene_info = parse_ref1(ref_f) # gene annotations
File "CIRCexplorer.py", line 201, in parse_ref1
start = starts[0]
IndexError: list index out of range