3UTR / DaPars2

Dynamics analysis of Alternative PolyAdenylation from RNA-seq
GNU General Public License v2.0
50 stars 23 forks source link

Breakpoint detection does not work in long read data #22

Open ArthurDondi opened 1 year ago

ArthurDondi commented 1 year ago

Dear DaPars2 developers,

Due to the nature of long-reads 3'UTR profiles (see image below), the breakpoint detection of DaPars2 is not accurate : it assumes uniform distribution before and after breakpoint, while long-reads have this slope of decreasing coverage.

For the example of COL6A2 below, DaPars2 will find:

Gene fit_value Predicted_Proximal_APA Loci Red Green
ENST00000361866.8|COL6A1|chr21|+ 1298.1 46003542 chr21:46003391-46005048 1.00 1.00

While the correct answer should be something like:

Gene fit_value Predicted_Proximal_APA Loci Red Green
ENST00000361866.8|COL6A1|chr21|+ XXXX 46004100 chr21:46003391-46005048 0.90 0.70

So it finds an incorrect breakpoint, leading to incorrect DPUI values.

While DaPars2 does not claim to work for long-reads, I thought it would be nice to have a version working for it.

Screenshot 2023-06-30 at 12 09 54