Closed smahaffey closed 5 years ago
I just came back from my vacation, sorry for the late response. Did the spurious data come out when you open the file using Excel or similar software? It should be the convert problem of Excel. In the txt file, the numbers are correct. Please double check.
1 29319343 29369105 circular_RNA/2 1 + 29319343 29319343 0,0,0 8 80,108,114,126,185,1010,169,148 0,2673,5119,6200,7675,10096,12006,49614 2 circRNA CUFF.4319.3 CUFF.4319.3 4,5,6,7,8,9,10,11 1:29295801-29319343|1:29369105-29377159
I'm so sorry, yes you are right. It looks correct as text. I was just trying to get it into columns to make it easier to look at for writing a script to parse and merge files across samples, but yes something happened when I opened it with excel. Thank you for your reply. I'm sorry for missing that and bothering you with it.
I'm merging results between samples using the output from the denovo circularRNA_full.txt file. I'm trying to compare the exon structure not just start and end coordinates to ensure a perfect match between samples. It seems like the exon length column[10] which is a comma seperated list must also be occasionally outputting numbers with commas in them if they are >=1000. Except that explanation doesn't make sense for row 3 below either or there are too few exon lengths and one exon has a length of 851,000,000,000bp. Then row 4 is an example of exonSizes that exceed 1000bp and don't have commas.
Here are some example rows from the file that I can't quite decipher other than that explanation. However the last row illustrates that some rows look as you would expect with exon sizes >=1000. So I'm not quite sure how to interpret this. I've cut off the remaining columns to simplify the example and I've attached a full file here. Brain.BNLx.2.full.txt
Thank you for any help you can provide on how to interpret these values.