DaPars 2 for long reads

Dear Dapars2 developers,

I work with long-reads (LR) scRNA data and wanted to compare 3'UTR between cell types. I gave DaPars2 a try, however I realised that the breakpoint detection does not work properly for LR as stated in #22 .

I then used DaPars2_Multi_Sample_Multi_Chr.py and adapted it for LR (DaPars2_Two_Samples_Multi_Chr_LR.py).

Briefly, I first look at the last covered point in the UTR (currently min_cov = 10), and define the search region to be between UTR start and last covered point (last_cov). I then look for the breakpoint in the same fashion as DaPars, for each position x, I :

define Long_UTR_abun as the mean of the coverage 50bp before last_cov
define Short_UTR_abun as the difference of the mean of the coverage 50bp before x and after x

The breakpoint is then the position x with the biggest squared Short_UTR_abun value.

It can probably be improved, and I'd be very happy to hear any suggestion, but it gives already convincing results (I can't share all the results yet sadly):

The original DaPars found (see image below):

Gene	fit_value	Predicted_Proximal_APA	Loci	Red	Green
ENST00000361866.8\|COL6A1\|chr21\|+	1298.1	46003542	chr21:46003391-46005048	1.00	1.00

And the LR method:

Gene	fit_value	Predicted_Proximal_APA	Loci	Red	Green
ENST00000361866.8\|COL6A1\|chr21\|+	1764.4	46004101	chr21:46003391-46005048	0.90	0.71

Which is much closer to the data.

Currently, DaPars2_Two_Samples_Multi_Chr_LR.py only works for 2 samples as I wanted to compute the Fisher-exact P-value, but it can be easily changed for a multi-samples version without Fisher test.

I also added a merge_Dapars.py script to merge results from all chromosomes, perform BH FDR correction, compute DPUI and keep the site with highest DPUI per gene, but it's only accessory.

3UTR / DaPars2

DaPars 2 for long reads #23