SAM attributes not added to Aligned.toTranscriptome.out.bam

drewjbeh commented 6 years ago

Hi Alex,

I am currently working on some ribo-seq from human cell lines. The analysis I want to do and some of the tools I want to run require that I have this data mapped to the genome as well as the transcriptome. I therefore thought the --quantMode TranscriptomeSAM would be a perfect solution to avoid having to map twice.

Unfortunately, I also require some non-standard SAM attributes for some of the scripts I have written for downstream analysis. Adding SAM tags using --outSAMattributes only seems to add the attributes to the Aligned.sortedByCoord.out.bam but not the Aligned.toTranscriptome.out.bam file. Is there a way to get around this, or am I missing something in the settings?

I have another question with regards Aligned.toTranscriptome.out.bam. My mapping procedure uses --outFilterMultimapNmax 1 to keep only uniquely mapping reads. I understand that Aligned.toTranscriptome.out.bam allows for multi-mapping, and this is especially obvious for different transcripts of the same gene. Would it be possible to be able to translate mapping coordinates to genes instead of transcripts during --quantMode TranscriptomeSAM. This would greatly reduce the multi-mapping issue and may be useful for those wanting gene level read alignments (perhaps for read counting, or in my case when using ribo-seq when I am not interested in transcript level information).

Thanks! Drew

alexdobin commented 6 years ago

Hi Drew,

Currently, only NH,HI,RG,rB,ch,MC,vW attributes can be output in the Transcriptome file. What attributes do you need for your downstream analysis? Most other genomic attributes cannot be converted into transcriptomic ones.

STAR can count the number of genomically unique reads, even if they map to multiple transcripts of the same gene, but I am not sure how to calculate "genic" coordinates if there are multiple transcripts per gene.

Cheers Alex

drewjbeh commented 6 years ago

Hi Alex,

Thanks for the response. The only tag I really need for the specific analysis I am doing is MD. As I understand it this MD tag relates to mismatching positions for the read itself and so should be independent of genomic or transcriptomic mapping. Am I right? Would it be possible to add this to the list of possible attributes in the output?

In terms of the unique reads to transcriptome, I think what would be needed is an annotation of only one representative transcript per gene (maybe the longest one) - this is similar to how we would map ribo-seq reads to a transcriptome anyway to maximise uniquely mapping reads. If I built a STAR index using such a GTF it would probably reduce the multi-mapping in the transcriptome out file. Does this make sense, or is there something else you could suggest?

Thanks again, Drew

On Thu, 9 Aug 2018 at 19:16 alexdobin notifications@github.com wrote:

Hi Drew,

Currently, only NH,HI,RG,rB,ch,MC,vW attributes can be output in the Transcriptome file. What attributes do you need for your downstream analysis? Most other genomic attributes cannot be converted into transcriptomic ones.

STAR can count the number of genomically unique reads, even if they map to multiple transcripts of the same gene, but I am not sure how to calculate "genic" coordinates if there are multiple transcripts per gene.

Cheers Alex

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/alexdobin/STAR/issues/466#issuecomment-411831839, or mute the thread https://github.com/notifications/unsubscribe-auth/AKi8vcAxZZVqm48gd5k3maZB4Gv-Bljjks5uPG5SgaJpZM4VxnPu .

alexdobin commented 6 years ago

Hi Drew,

you are right that MD tag should be the same for transcriptomic alignment. However, the problem is that transcriptomic transformation by default extends soft-clipped ends, to be compatible with RSEM, which would change the MD tag. Let me think if I can get around it somehow.

Creating a representative one transcript per gene annotation seems reasonable. I would add the full list of junctions (from the full annotation) at the index generation step so that you do not miss any annotated junctions, even if they do not enter the abridged annotation.

Cheers Alex

drewjbeh commented 6 years ago

Hi Alex,

Thank you! I know this is a very specific problem that many others probably will never need a solution for, so I appreciate you taking the time to find a fix.

I will keep in mind adding the full junction set - thanks for the suggestion.

On Mon, 13 Aug 2018 at 17:27 alexdobin notifications@github.com wrote:

Hi Drew,

you are right that MD tag should be the same for transcriptomic alignment. However, the problem is that transcriptomic transformation by default extends soft-clipped ends, to be compatible with RSEM, which would change the MD tag. Let me think if I can get around it somehow.

Creating a representative one transcript per gene annotation seems reasonable. I would add the full list of junctions (from the full annotation) at the index generation step so that you do not miss any annotated junctions, even if they do not enter the abridged annotation.

Cheers Alex

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/alexdobin/STAR/issues/466#issuecomment-412558476, or mute the thread https://github.com/notifications/unsubscribe-auth/AKi8vY46P-XAuhaXFc0_ftTCSHndL5bBks5uQZrjgaJpZM4VxnPu .

ewallace commented 5 years ago

Hi Alex and Drew, I just ran into exactly the same problem. It would be good if --outSAMattributes All put all possible attributes into Aligned.toTranscriptome.out.bam.

I think I ran into this issue for the same reason as Drew: for Ribo-seq data we need transcriptome alignments and also to very sensitively deal with mismatches, notably 5' end mismatches caused by RT adding extra bases.

Related to #743 - ideally it would also be easy to just make the transcriptome alignment from the genome alignment, while keeping the flags.

Thanks Edward

btownshend commented 7 months ago

I've hit the same issue above -- no MD tag in the Aligned.toTranscriptome.out.bam; would it be possible to add that?

alexdobin commented 7 months ago

There is no option in STAR to do it, bat you can use samtools to add MD tag to the existing alignments in BAM.

ewallace commented 7 months ago

Yes, we know there is no option in STAR, I think the issue ticket is requesting that option as a feature?

Thanks!

alexdobin / STAR

SAM attributes not added to Aligned.toTranscriptome.out.bam #466