As I am using a different software than TopHat for RNA reads mapping and alignment to reference, I have to sort my alignment file before submitting it to Cufflinks. On your manual, I see that you are suggesting to sort file using: sort -k 3,3 -k 4,4n hits.sam > hits.sam.sorted motivating that "The SAM file supplied to Cufflinks must be sorted by reference position".
Actually, if I sort my file like you suggested, I get issues making Cufflinks work.
Here is the exact command I executed for sorting :
samtools view alignment.bam | sort -k 3,3 -k 4,4n -T . > alignment.sorted.sam 2> alignment.sorted.log
Here is my command I executed for Cufflinks:
cufflinks --GTF-guide annotation.gtf --library-type fr-firststrand alignment.sorted.sam
Here is my error log:
You are using Cufflinks v2.2.1, which is the most recent release.
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
File star_sorted.sam doesn't appear to be a valid BAM file, trying SAM...
[13:50:49] Loading reference annotation.
[13:50:55] Inspecting reads and determining fragment length distribution.
> Processing Locus 211000022278144:1013-1326 [************************ ] 99%
Error: this SAM file doesn't appear to be correctly sorted!
current hit is at 211000022278158:591, last one was at 211000022278157:572
Cufflinks requires that if your file has SQ records in
the SAM header that they appear in the same order as the chromosomes names
in the alignments.
If there are no SQ records in the header, or if the header is missing,
the alignments must be sorted lexicographically by chromsome
name and by position.
The workaround for this was to use samtools (default lexicographical) or picard tools (option SORT_ORDER=coordinate) instead of your suggested sort.
However, your sort solution worked well for smaller alignment files. For example, it worked for bam alignment files of about 5GB. The error I reported here was due to bam alignment files of about 40GB.
Hi,
I'd like to report a probable consideration for an update in Cufflinks manual (http://cole-trapnell-lab.github.io/cufflinks/cufflinks/index.html). I hope this is the correct place to do this.
As I am using a different software than TopHat for RNA reads mapping and alignment to reference, I have to sort my alignment file before submitting it to Cufflinks. On your manual, I see that you are suggesting to sort file using:
sort -k 3,3 -k 4,4n hits.sam > hits.sam.sorted
motivating that "The SAM file supplied to Cufflinks must be sorted by reference position".Actually, if I sort my file like you suggested, I get issues making Cufflinks work.
Here is the exact command I executed for sorting :
samtools view alignment.bam | sort -k 3,3 -k 4,4n -T . > alignment.sorted.sam 2> alignment.sorted.log
Here is my command I executed for Cufflinks:
cufflinks --GTF-guide annotation.gtf --library-type fr-firststrand alignment.sorted.sam
Here is my error log:
The workaround for this was to use samtools (default lexicographical) or picard tools (option SORT_ORDER=coordinate) instead of your suggested sort.
However, your sort solution worked well for smaller alignment files. For example, it worked for bam alignment files of about 5GB. The error I reported here was due to bam alignment files of about 40GB.