YangLab / CIRCexplorer2

circular RNA analysis toolset
http://circexplorer2.readthedocs.org/
Other
77 stars 44 forks source link

circularRNA_full.txt - confusing exonCount/exonSize/exonOffsets #33

Closed smahaffey closed 5 years ago

smahaffey commented 5 years ago

I'm merging results between samples using the output from the denovo circularRNA_full.txt file. I'm trying to compare the exon structure not just start and end coordinates to ensure a perfect match between samples. It seems like the exon length column[10] which is a comma seperated list must also be occasionally outputting numbers with commas in them if they are >=1000. Except that explanation doesn't make sense for row 3 below either or there are too few exon lengths and one exon has a length of 851,000,000,000bp. Then row 4 is an example of exonSizes that exceed 1000bp and don't have commas.

Here are some example rows from the file that I can't quite decipher other than that explanation. However the last row illustrates that some rows look as you would expect with exon sizes >=1000. So I'm not quite sure how to interpret this. I've cut off the remaining columns to simplify the example and I've attached a full file here. Brain.BNLx.2.full.txt

chr start end name score strand thickStart thickEnd itemRBG ExonCount exonSizes exonOffsets
1 2012643 2017574 circular_RNA/1 0 - 2012643 2012643 0,0,0 2 1,501,188 0,4743
1 16514096 16530121 circular_RNA/2 0 + 16514096 16514096 0,0,0 2 6,031,121 0,15904
1 29319343 29369105 circular_RNA/2 1 + 29319343 29319343 0,0,0 8 801,081,141,261,851,000,000,000 0,2673,5119,6200,7675,10096,12006,49614
1 16480949 16503517 circular_RNA/2 0 + 16480949 16480949 0,0,0 10 170,2034,96,186,153,3267,230,3709,131,141 0,604,2855,6138,7632,8836,13701,15411,20947,22427

Thank you for any help you can provide on how to interpret these values.

kepbod commented 5 years ago

I just came back from my vacation, sorry for the late response. Did the spurious data come out when you open the file using Excel or similar software? It should be the convert problem of Excel. In the txt file, the numbers are correct. Please double check.

1   29319343    29369105    circular_RNA/2  1   +   29319343    29319343    0,0,0   8   80,108,114,126,185,1010,169,148 0,2673,5119,6200,7675,10096,12006,49614 2   circRNA CUFF.4319.3 CUFF.4319.3 4,5,6,7,8,9,10,11   1:29295801-29319343|1:29369105-29377159
smahaffey commented 5 years ago

I'm so sorry, yes you are right. It looks correct as text. I was just trying to get it into columns to make it easier to look at for writing a script to parse and merge files across samples, but yes something happened when I opened it with excel. Thank you for your reply. I'm sorry for missing that and bothering you with it.