read assignment data clarification

GoekeLab / bambu

Reference-guided transcript discovery and quantification for long read RNA-Seq data

GNU General Public License v3.0

190 stars 24 forks source link

read assignment data clarification #432

Closed sparthib closed 5 months ago

sparthib commented 5 months ago

hi there,

I switched trackReads = TRUE to get the read assignments, could I get clarification on the output?

I see 3 columns: readId, equalMatches, compatibleMatches

A lot of the entries under equalMatches and compatibleMatches are NULL, but I do see under compatibleMatches more than 1 value. What do these columns mean and how is the final transcript assignment chosen when a read is assigned to more than 1 transcript?

Thanks, Sowmya

andredsim commented 5 months ago

Hi Sowmya,

The description for these columns can be found in the documentation here. https://github.com/GoekeLab/bambu?tab=readme-ov-file#Tracking-read-to-transcript-assignment

NULL means that the read has no equal or compatibleMatches with any of the transcripts in the annotations. Unlike some other tools bambu does not assign 1 read to only 1 transcript because its possible 1 read can map to multiple transcripts and therefore this needs to be accounted for in the abundance estimates. Therefore if a read matches multiple it will have multiple ids in this field.

Hope this helps, Andre Sim