COMBINE-lab / salmon

🐟 🍣 🍱 Highly-accurate & wicked fast transcript-level quantification from RNA-seq reads using selective alignment
https://combine-lab.github.io/salmon
GNU General Public License v3.0
780 stars 165 forks source link

Salmon and RSEM TPM discrepancy #612

Open rmurray2 opened 3 years ago

rmurray2 commented 3 years ago

I recently quantified a set of RNA-seq samples using both Salmon and RSEM to eventually do differential isoform analysis. When the overlap of significant isoforms was rather low -- even though the only difference was whether the TPM values were from Salmon or RSEM -- I started comparing the TPM values and saw something strange.

After taking the log2 of the TPM values, I generated these plots. TPM values for salmon are from quant.sf files and for RSEM, *.isoforms.results files.

The there three very consistent commonalities between these plots:

  1. There seems to be a large number of transcripts with zero expression in one but not the other.
  2. There seems to be a set of isoforms that is more highly expressed in Salmon across all plots.
  3. The bulk of reads that are close to the diagonal line are mostly concentrated on the RSEM side

And the details of the runs:

RSEM v1.3.2 commands:

indexing:

quant:

salmon v1.4.0 commands:

indexing:

quant:

Any idea what could be going on here?

rob-p commented 3 years ago

Hi @rmurray2,

Thanks again for the detailed question (I answered them in reverse order, so that's why I'm saying "again" here). There are a few things going on that could be leading to differences. They are, in the order I think they will have an effect on the result:

These are the biggest potential sources that I can currently imagine for the differences you are seeing. I would recommend exploring them in approximately this order. If you run into any issues or have any questions as you investigate, please don't hesitate to follow up here or reach out.