GoekeLab / bambu

Reference-guided transcript discovery and quantification for long read RNA-Seq data
GNU General Public License v3.0
189 stars 24 forks source link

Counting step includes all reads feed in in the BAM #298

Closed callumparr closed 2 years ago

callumparr commented 2 years ago

1) Is there any filtering steps of reads going from the BAM alignment file to final transcriptome assembly. I.e. are some reads discards because they fail some QC filter like fraction coverage or read identity?

2) related to no.1. Is the counting at the end based on all reads from the original BAM files or only counting reads that were used to assemble the transcripts?

andredsim commented 2 years ago

Hi there,

Sorry for the delayed response.

  1. All reads in the bam file are used to identify novel transcripts as we do not filter them out based on the statistics you mentioned. However reads may end up not contributing to the final transcriptome assembly for other reasons such as the transcript not having enough read support. the full explanation of how this filtering works is best left to the manuscript (the updated preprint should be coming soon).
  2. All reads are used for quantification, even if they do not contribute to the final transcriptome assembly. An example of this occuring is partial reads which may only overlap some of the exons of a transcript.