comprna / reorientexpress

Transcriptome long-read orientation with Deep Learning
MIT License
9 stars 4 forks source link

Reads strand after predictions #3

Open akramdi opened 4 years ago

akramdi commented 4 years ago

Hi,

I was able to run reorientexpress and map the stranded reads using mnimap2:

minimap2 -ax splice -G 30k -u f $CHRLEN $strandedFASTQ

Minimap2 reports the transcript strand in the ts tag of alignment files. I noticed that, for some reads, the strand reported in ts tag does not match strand of the read reported by sam flags. This usually occurs when we deal with unstranded libraries, but here the fastq files are supposed to be stranded.

Here a snapshot of two genes that should produce transcripts in the forward orientation.

  1. If I color the reads using the ts tag, they agree with the gene orientation in both cases:

reorient_question-color-ts

  1. But if I keep the default coloring (read strands), the orientation is off:

reorient_question

Because the libraries are stranded now, shouldn't we expect that the read strands to match the actual transcript orientation? Can you help me understand why is not the case?

Thanks for the help, Amira

EduEyras commented 4 years ago

Hi Amira,

To avoid confusion, I like calling the strand of the read "orientation", and the strand of the gene "strand"

They do not have to coincide. If you do direct-RNA-seq, all orientations are forward, but then the reads may map to either forward or reverse. In theory one would not need to re-orientate the reads.

When you say that the libraries are stranded, do you mean that they are cDNA libraries with stranded adapters? What did you use to determine the adapter and/or orientation of the cDNA reads?

Best

Eduardo

On Tue, 8 Oct 2019 at 02:26, Kramdi Amira notifications@github.com wrote:

Hi,

I was able to run reorientexpress and map the stranded reads using mnimap2:

minimap2 -ax splice -G 30k -u f $CHRLEN $strandedFASTQ

Minimap2 reports the transcript strand in the ts tag of alignment files. I noticed that, for some reads, the strand reported in ts tag does not match strand of the read reported by sam flags. This usually occurs when we deal with unstranded libraries, but here the fastq files are supposed to be stranded.

Here a snapshot of two genes that should produce transcripts in the forward orientation.

  1. If I color the reads using the ts tag, they agree with the gene orientation in both cases:

[image: reorient_question-color-ts] https://user-images.githubusercontent.com/8793228/66324805-70158b80-e926-11e9-8db4-956d85a8c7f3.png

  1. But if I keep the default coloring (read strands), the orientation is off:

[image: reorient_question] https://user-images.githubusercontent.com/8793228/66324296-748d7480-e925-11e9-92f9-287323497598.png

Because the libraries are stranded now, shouldn't we expect that the read strands to match the actual transcript orientation? Can you help me understand why is not the case?

Thanks for the help, Amira

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/comprna/reorientexpress/issues/3?email_source=notifications&email_token=ADCZKBYXUQGJ5AFHZ32F5CDQNNIJNA5CNFSM4I6GEFNKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HQDGDJA, or mute the thread https://github.com/notifications/unsubscribe-auth/ADCZKBYBP5ZXET475CZK7WDQNNIJNANCNFSM4I6GEFNA .

-- Prof. E Eyras EMBL Australia Group Leader The John Curtin School of Medical Research - Australian National University https://github.com/comprna http://scholar.google.com/citations?user=LiojlGoAAAAJ

akramdi commented 4 years ago

Hi Eduardo,

I have cDNA libraries (no stranded adapters) and I used reorientexpress to predict the 5'-to-3' orientation of the reads. What is shown in the snaphots above is the mapping result of the predicted reads. Before the prediction, I had a mixture of orientations mapping to each gene (which is what I expect from a cDNA library) - here's the mapping before re-orientation:

reorient_question-original-libraries

After re-orientation, I assumed that the libraries are now equivalent to direct-RNA-seq libraries. Is it a correct assumption?

If this is the case, I was expecting the read orientation to match the strand of the gene (whether it is forward or reverse), but in my case they do not match and I was wondering why.

Thank again, Amira

EduEyras commented 4 years ago

Hi Amira,

Yes, after ReorientExpress the reads should be oriented as expected from direct RNA (at least ~85% of them)

If you were to map those reads to the transcripts directly (transcript sequences) rather than to the genome, reads should map without having to revcomp them, hence both forward.

When mapping to the genome, minimap2 does not know about gene strands. But it might be that it has a preference to map on forward and is revcomp'ing the query?

E.

On Tue, 8 Oct 2019 at 21:42, Kramdi Amira notifications@github.com wrote:

Hi Eduardo,

I have cDNA libraries (no stranded adapters) and I used reorientexpress to predict the 5'-to-3' orientation of the reads. What is shown in the snaphots above is the mapping result of the predicted reads. Before the prediction, I had a mixture of orientations mapping to each gene (which is what I expect from a cDNA library) - here's the mapping before re-orientation:

[image: reorient_question-original-libraries] https://user-images.githubusercontent.com/8793228/66372762-1f924280-e9a7-11e9-8461-ded83769a1ba.png

After re-orientation, I assumed that the libraries are now equivalent to direct-RNA-seq libraries. Is it a correct assumption?

If this is the case, I was expecting the read orientation to match the strand of the gene (whether it is forward or reverse), but in my case they do not match and I was wondering why.

Thank again, Amira

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/comprna/reorientexpress/issues/3?email_source=notifications&email_token=ADCZKB322I4GIUQOWCYCLJDQNRP27A5CNFSM4I6GEFNKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEATXJQQ#issuecomment-539456706, or mute the thread https://github.com/notifications/unsubscribe-auth/ADCZKB7UYQ2MNIFUCV37ANTQNRP27ANCNFSM4I6GEFNA .

-- Prof. E Eyras EMBL Australia Group Leader The John Curtin School of Medical Research - Australian National University https://github.com/comprna http://scholar.google.com/citations?user=LiojlGoAAAAJ