COMBINE-lab / salmon

🐟 🍣 🍱 Highly-accurate & wicked fast transcript-level quantification from RNA-seq reads using selective alignment
https://combine-lab.github.io/salmon
GNU General Public License v3.0
769 stars 161 forks source link

Error bar on TPM estimation #246

Closed AngryMaciek closed 5 years ago

AngryMaciek commented 6 years ago

Dear Authors,

Is there any way to get the information on the uncertainty of the TPM quantification per transcript? It would be useful to have some kind of measure on the range the inferred expression may vary...

Best! Maciek

rob-p commented 6 years ago

Hi @AngryMaciek,

You can get either Gibbs samples from the posterior, or bootstrap estimates. To get the former, you pass --numGibbsSamples <nsamp> to salmon, and to get the latter, you pass --numBootstraps <nsamp>. Either way, samples will be written to a binary, gzipped file in aux_dir/bootstrap/ bootstraps.gz (we decided to keep a uniform file name regardless of the sampling type, however the type of sampling performed can be derived from aux_dir/meta_info.json).

Two things to point out here. First, the binary file format can be converted to TSV if you prefer using this script. Second, these are samples over the number of reads assigned to the transcripts (not the TPMs directly). However, you could easily convert samples over the number of reads to samples over the TPMs by applying the TPM formula (i.e. TPM_i = 10^6 * (num_reads_i / effective_length_i) / (sum_j (num_reads_j / effective_length_j))).