Closed zslastman closed 5 years ago
Hi. The first bunch of code you posted was implemented to warn the users about discrepancies in transcript names in BAM and annotation files. The second one is to notify how many reads map on the negative strand and are then discarded. This should be done only if reads are aligned on transcript sequences and all reads should align on positive strand. These two steps are not connected because riboWaltz is not designed for BAM files based on genomic coordinates. This choice is due to the main aim of the RiboSeq assay that is supposed to generate RNA fragments from mRNAs but not from introns or intergenic regions.
There might have been a misunderstanding about what riboWaltz can do and which are the inputs it uses. Up to now, riboWaltz only works with BAM from transcriptome alignments. BAM from transcriptome alignments can be generated in two ways:
All the references in the documentation of riboWaltz to genomic sequences refer to the latter case, where you do have a BAM file form a transcriptome alignemnt but the FASTA file contains the chromosome sequences instead of the transcript ones. I am very sorry this might have been misleading.
I'm putting an effort into updating the documentation to make this point as clear as possible.
Bests, Fabio
In the below code, bamtolist checks whether the reads have a transcript matching the annotation:
But it does this before checking if the 'transcript_aligned' options is true:
Thus the bamtolist function doesn't work on genomically aligned bams, which won't have any transcript field.