COMBINE-lab / salmon

šŸŸ šŸ£ šŸ± Highly-accurate & wicked fast transcript-level quantification from RNA-seq reads using selective alignment
https://combine-lab.github.io/salmon
GNU General Public License v3.0
771 stars 161 forks source link

Salmon alevin transcript level outputs without transcript to gene map file #588

Closed keithgmitchell closed 3 years ago

keithgmitchell commented 3 years ago

Is there a way to produce the transcript level matrix output?

salmon alevin -l ISR -1 /share/illumina/slims_view/cpdfjgh/Un_DTDB267/Project_XCJZ_10XSC0087/JZ_KO_S26_L002_R1_001.fastq.gz -2 /share/illumina/slims_view/cpdfjgh/Un_DTDB267/Project_XCJZ_10XSC0087/JZ_KO_S26_L002_R2_001.fastq.gz --chromium -i /share/biocore/keith/2020-A/scripts/reference_sources/salmon/GRCm38.salmon_decoys -p 4 -o /share/biocore/projects/Zhang_J_UCD/mouse_sc_2020/salmon/salmon_output --dumpMtx
Version Info: This is the most recent version of salmon.

Transcript to Gene Map File not provided
 Exiting Now./salmon.counts.sh: line 39:  8882 Segmentation fault      salmon alevin -l ISR -1 /share/illumina/slims_view/cpdfjgh/Un_DTDB267/Project_XCJZ_10XSC0087/JZ_KO_S26_L002_R1_001.fastq.gz -2 /share/illumina/slims_view/cpdfjgh/Un_DTDB267/Project_XCJZ_10XSC0087/JZ_KO_S26_L002_R2_001.fastq.gz --chromium -i /share/biocore/keith/2020-A/scripts/reference_sources/salmon/GRCm38.salmon_decoys -p 4 -o /share/biocore/projects/Zhang_J_UCD/mouse_sc_2020/salmon/salmon_output --dumpMtx
0
rob-p commented 3 years ago

Hi @keithgmitchell,

Alevin is designed for droplet-based, tagged-end protocols, and in the vast majority of these protocols, transcript-level quantification isn't really reliable enough to be useful. Since most tagged-end protocols sequence information from only the 3' end of the transcripts, there is a highly-biased coverage signal, and discerning UMI assignment at the transcript level is usually not possible. Therefore, I wouldn't generally recommend trying to obtain transcript-level counts from alevin and we haven't tested it in this context. If you have a particular reason you want to look at transcript counts and believe it may be reasonable in your specific use-case, you can alway pass in a gene-to-transcript map that just maps each transcript to itself, which will result in a transcript-level output matrix. However, I anticipate that the resolution problem will become more difficult in this case, and there will be much more uncertainty in the assignments. @k3yavi, please feel free to add anything you think I may have missed.

--Rob

msettles commented 3 years ago

Awesome answer!! and nice hack, thanks! In this case, one gene has an exon knocked out towards the 3' end, we are looking for cell type expression that uses that exon vs those that don't in a KO vs WT experiment, we aren't seeing any reads align to this exon (in the WT sample) and are hoping that transcript presence/absence may be a proxy signal. So we really just want 1 gene's transcript level expression, then its cross our fingers.

keithgmitchell commented 3 years ago

Thanks @rob-p yeah what @msettles said ^^^ I will give it a whirl šŸ‘