Closed NBaileyNCL closed 6 years ago
What is the source of that annotation file? You only show the output of seqret and biopython attempts to "convert" the file but the problem seems to be the non-standard format of the original file TgalA1GenomeAnnotation2.gtf
, which you did not show. For example what is the output of this command:
fgrep TGA_000005100.1 TgalRNAseqAnalysis2/SequenceData/TgalA1GenomeAnnotation2.gtf
The proper resolution would be going back to the source and ask for a proper annotation format to be produced (GTF or GFF3). Barring that, some light scripting can probably be used to salvage the necessary annotation data from that file.. (and discard all that sequence, translation, ortholog data etc.) I guess I could come up with some quick'n'dirty perl code to do just that if you show me the original format, as I asked above.
Thanks for the offer gpertea, you're right - the issue was with the annotation file format. The source was an embl file (unpublished, not sure it's in a database or not), but converting the annotations from gff to gtf using gffread, and the sam alignment to a sorted bam file using sam tools solved the issue.
Hi all,
I wondered if you could give me some help with an issue I'm having using Cufflinks to assemble my aligned RNAseq data. I ran cufflinks with the command:
I suspected a memory issue, but have tried this on multiple machines, including a 12 core, 50Gb computer and a powerful bioinformatics server at my facility, with the same issue
Any help would be greatly appreciated
Nick