dieterich-lab / JACUSA2

New version of JACUSA -> 2.0
GNU General Public License v3.0
23 stars 3 forks source link

rt-arrest discussion #15

Closed CDieterich closed 5 years ago

CDieterich commented 5 years ago

I am coming back to our discussion on which read side to consider in the case of rt-arrest.

As the name "rt-arrest" clearly indicates, we should only speak about fragments and not about mates in paired-end sequencing. An rt-arrest is only defined relative to the fragment.. so looking at both reads in a pair is useless.

So, for single end sequencing it is clear:

In case of firststrand (cDNA synthesis) we consider the end of the read for rt-arrest. In case of secondstrand (cDNA synthesis) we consider the start of the read for rt-arrest

For, paired end sequencing it is a little more complicated:

In case of firststrand (rev-fwd pair), we should consider the start of the second mate. In case of secondstrand (fwd-rev pair), we should consider the start of the first mate.

Does this make sense ? @Mitschka - please implement

piechottam commented 5 years ago

I agree with you, taken the following into account:

RF-FIRSTSTRAND: 1 2 Mate R F orientation to template -> <- PE reads -------> first cDNA synthesis <-------*-- RNA Template

FR-SECONDSTRAND: 2 1 Mate R F orientation to template -> < PE reads <------ second cDNA synthesis -------> first cDNA synthesis <-------*-- RNA Template

In cases where in PE reads the mate that should be used to infer the arrest pos. does not exist - because it is unmapped - there will be NO arrest position. Currently, this PE read will have only read through positions.

In order to account for this new PE behaviour, I adjusted the code of the interface "LocationInterpreter" that defines what region of a read is read through and what read arrest. I added preliminary junit tests to test behaviour - I'll keep you posted

piechottam commented 5 years ago

Using the following slide SE/PE Lib. and looking at alignment positions of the RNA fragment to define read start/end I come to the following results to define the arrest position:

When gene is found on forward DNA strand:

Lib. type SE PE
RF FIRSTSTRAND end of read alignment end of fragment alignment
FR SECONDSTRAND end of read alignment end of fragment alignment

Basically, it is always alignment end! When gene is found on reverse DNA strand, then we need to look at the alignment start!

e.g.: DNA with gene on forward strand

5--coding--3 forward := will be modified in RNA 3-template-5 reverse

transcription yields: 5-mRNA--*--3 and corresponds to coding

first cDNA synthesis of 5-mRNA----3 yields 3-cDNA--5 cDNA corresponds to template

From this, it is immediately clear that for genes on forward strand, we need to look at alignment end to identify the arrest position. For genes on reverse strand, we need to look at alignment start!

Pls, comment!

CDieterich commented 5 years ago

I think, we have to be careful with our definitions on start and end. Do we speak about reads, cDNA or genomic coordinates. While, I agree with your rf-firststrand interpretation, I am not sure about your fr-secondstrand. I think, it should be the start of the first read as shown in your picture.

piechottam commented 5 years ago

The arrest site will be in corporated during 1st cDNA synthesis DNA 5------3 conding 3------5 template

5-----3 mRNA 3---5 cDNA (everything DOWNSTREAM of will be ignore in any following PCR e.g.: second strand synthesis). The arrest site will be always 3' end - alignmen end - in genomic coordinates!

This corresponds to Alignment End of READ (in SE) or FRAGMENT (in PE). Depending of library type this is either alignment end of 1st or 2nd mate: RF_FIRSTSTRAND: 2-mate-F=>---<=R-1mate FR_SECONDSTRAND: 1-mate-F=>---<=R-2mate

If gene is on REVERSE strand then we should be looking at alignment start. Does this make sense?

CDieterich commented 5 years ago

No, it is the other way around. Take a look at

5-----3 mRNA --5 cDNA

(everything UPSTREAM of * will be ignore in any following PCR e.g.: second strand synthesis). I will draw a picture for you.

piechottam commented 5 years ago

I think i got it know - cDNA synthesis goes from 5' -> 3' because of 3'-primer!

It is basically inverted. Foward Strand -> Alignment Start Reverse Strand -> Alignment End

CDieterich commented 5 years ago

Exactly

piechottam commented 5 years ago

Okay - I am gone finalize it...