Closed CDieterich closed 5 years ago
I agree with you, taken the following into account:
RF-FIRSTSTRAND: 1 2 Mate R F orientation to template -> <- PE reads -------> first cDNA synthesis <-------*-- RNA Template
FR-SECONDSTRAND: 2 1 Mate R F orientation to template -> < PE reads <------ second cDNA synthesis -------> first cDNA synthesis <-------*-- RNA Template
In cases where in PE reads the mate that should be used to infer the arrest pos. does not exist - because it is unmapped - there will be NO arrest position. Currently, this PE read will have only read through positions.
In order to account for this new PE behaviour, I adjusted the code of the interface "LocationInterpreter" that defines what region of a read is read through and what read arrest. I added preliminary junit tests to test behaviour - I'll keep you posted
Using the following slide SE/PE Lib. and looking at alignment positions of the RNA fragment to define read start/end I come to the following results to define the arrest position:
When gene is found on forward DNA strand:
Lib. type | SE | PE |
---|---|---|
RF FIRSTSTRAND | end of read alignment | end of fragment alignment |
FR SECONDSTRAND | end of read alignment | end of fragment alignment |
Basically, it is always alignment end! When gene is found on reverse DNA strand, then we need to look at the alignment start!
e.g.: DNA with gene on forward strand
5--coding--3 forward := will be modified in RNA 3-template-5 reverse
transcription yields: 5-mRNA--*--3 and corresponds to coding
first cDNA synthesis of 5-mRNA----3 yields 3-cDNA--5 cDNA corresponds to template
From this, it is immediately clear that for genes on forward strand, we need to look at alignment end to identify the arrest position. For genes on reverse strand, we need to look at alignment start!
Pls, comment!
I think, we have to be careful with our definitions on start and end. Do we speak about reads, cDNA or genomic coordinates. While, I agree with your rf-firststrand interpretation, I am not sure about your fr-secondstrand. I think, it should be the start of the first read as shown in your picture.
The arrest site will be in corporated during 1st cDNA synthesis DNA 5------3 conding 3------5 template
5-----3 mRNA 3---5 cDNA (everything DOWNSTREAM of will be ignore in any following PCR e.g.: second strand synthesis). The arrest site will be always 3' end - alignmen end - in genomic coordinates!
This corresponds to Alignment End of READ (in SE) or FRAGMENT (in PE). Depending of library type this is either alignment end of 1st or 2nd mate: RF_FIRSTSTRAND: 2-mate-F=>---<=R-1mate FR_SECONDSTRAND: 1-mate-F=>---<=R-2mate
If gene is on REVERSE strand then we should be looking at alignment start. Does this make sense?
No, it is the other way around. Take a look at
5-----3 mRNA --5 cDNA
(everything UPSTREAM of * will be ignore in any following PCR e.g.: second strand synthesis). I will draw a picture for you.
I think i got it know - cDNA synthesis goes from 5' -> 3' because of 3'-primer!
It is basically inverted. Foward Strand -> Alignment Start Reverse Strand -> Alignment End
Exactly
Okay - I am gone finalize it...
I am coming back to our discussion on which read side to consider in the case of rt-arrest.
As the name "rt-arrest" clearly indicates, we should only speak about fragments and not about mates in paired-end sequencing. An rt-arrest is only defined relative to the fragment.. so looking at both reads in a pair is useless.
So, for single end sequencing it is clear:
In case of firststrand (cDNA synthesis) we consider the end of the read for rt-arrest. In case of secondstrand (cDNA synthesis) we consider the start of the read for rt-arrest
For, paired end sequencing it is a little more complicated:
In case of firststrand (rev-fwd pair), we should consider the start of the second mate. In case of secondstrand (fwd-rev pair), we should consider the start of the first mate.
Does this make sense ? @Mitschka - please implement