ClarkLaboratory / IsoLamp

Isoform discovery from long-read amplicon sequencing data
MIT License
1 stars 2 forks source link

remap and count with salmon #4

Closed alexyfyf closed 8 months ago

alexyfyf commented 8 months ago

Hi Josie,

I'm wondering if it will be better to change the remapping to the suggested approach used by salmon (i.e. map with minimap2 and salmon quant with -ont flag). See a few discussions and tutorials here: https://www.biostars.org/p/9556364/ https://combine-lab.github.io/salmon-tutorials/2021/ont-long-read-quantification/

I'm not sure how big a difference it will make, especially for amplicon data.

alexyfyf commented 8 months ago

also, TPM calculation seems not correct, https://github.com/ClarkLaboratory/IsoLamp/blob/aa4838c3c09664faf0ab86bc2c243b84b9469bcb/scripts/combine_salmon_quants.R#L121C3-L121C83 I did not see anywhere accounting transcript length. Sorry if I missed something.

josiegleeson commented 8 months ago

Hi,

We have a publication coming out soon where we will show some benchmarking results. We did try this salmon command but it actually performed worse, we're not sure why but it might have something to do with the results from amplicon sequencing vs bulk.

As for TPM, because we have long-read data we don't normalise for transcript length as with short reads. So currently the TPM is simply the transcript abundance * 1M.

Hope this helps! Josie.