cole-trapnell-lab / cufflinks

Boost Software License 1.0
310 stars 116 forks source link

Cufflinks - Inconsistent Results for SAM/BAM File Sorted by Reference Position #134

Open CharlesARoy opened 2 years ago

CharlesARoy commented 2 years ago

Hello, I've been using Cufflinks to process SAM/BAM files that were sorted with Samtools and am getting some inconsistent results.

As mentioned in the Samtool documentation, Samtools sorts the alignments by leftmost coordinates, which I think meets the criteria in the Cufflinks documentation which states that "the SAM file supplied to Cufflinks must be sorted by reference position". At least, this issue seemed to indicate that using Samtools sort was fine. Just to be safe, I tried sorting the sorted SAM files with sort -k 3,3 -k 4,4n per the Cufflinks documentation and the resulting SAM file was not different in any way.

The SAM/BAM files I'm using contain many reads with the same POS coordinate. Depending on how earlier steps were performed in the pipeline, Samtools doesn't always sort the reads in exactly the same way. The reads will be sorted properly by the POS field, but reads with the same POS value don't always appear in the same order. This may be an issue with Samtools, but I don't think it should affect Cufflinks the way it does.

When given two such SAM/BAM files that were pre-processed slightly differently but which contain exactly the same reads, the Cufflinks FPKM results differ on some lines. They don't differ drastically, but I would expect them to be identical given that the inputs are identical other than slight differences in the sort order within a given POS value.