NCI-RBL / iCLIP

RNA Biology Pipeline to Characterize protein-RNA Interactions
https://rbl-nci.github.io/iCLIP/
MIT License
4 stars 2 forks source link

Investigate overhangs and GAPS #110

Closed slsevilla closed 2 years ago

slsevilla commented 2 years ago

Assessment: For each read, an alignment was assessed for any gaps, with two possible scenarios:

alignment—gap—alignment

alignment—gap–insertion/deletion—alignment

For all reads that match outcome #1, reads were counts. For gaps with reads larger than 100, overhang and parent were assigned to the alignments as follows:

overhang: the smaller length of the two alignments

parent: the larger length of the two alignments

Ratios were calculated for each read that met this criterion. For example #5 would have a ratio of 2/70, #6 would have a ratio of 2/70, and #7 would not be included due to the gap being <100.

alignment1(N=2)—gap(N=100)—alignment2(N=70)

alignment3(N=70)—gap(N=100)—alignment4(N=2)

alignment3(N=70)—gap(N=10)—alignment4(N=2)

For reads with N>2 gaps, each gap was assessed individually. For example #8 would have 2 gaps assessed, example #9 would have 3 gaps assessed and example #10 would have 1 gap assessed as gap2 does not meet the 100 threshold requirement

alignment1(N=2)—gap1(N=100)—alignment2(N=70)—gap2(N=100)—alignment4(N=2)

alignment3(N=70)—gap1(N=100)—alignment4(N=2)–gap2(N=100)—alignment4(N=2)–gap3(N=100)—alignment4(N=2)

alignment3(N=70)—gap1(N=100)—alignment4(N=2)–gap2(N=50)—alignment4(N=2)

Tools: Output of NOVOALIGN was compared to STAR for this analysis. NOVOALIGN does not allow users to manipulate settings in such a way to control these scenarios as completely as STAR allows.

slsevilla commented 2 years ago

varying parameters reported here: https://github.com/RBL-NCI/iCLIP/issues/105