I work with long-reads (LR) scRNA data and wanted to compare 3'UTR between cell types. I gave DaPars2 a try, however I realised that the breakpoint detection does not work properly for LR as stated in #22 .
I then used DaPars2_Multi_Sample_Multi_Chr.py and adapted it for LR (DaPars2_Two_Samples_Multi_Chr_LR.py).
Briefly, I first look at the last covered point in the UTR (currently min_cov = 10), and define the search region to be between UTR start and last covered point (last_cov). I then look for the breakpoint in the same fashion as DaPars, for each position x, I :
define Long_UTR_abun as the mean of the coverage 50bp before last_cov
define Short_UTR_abun as the difference of the mean of the coverage 50bp before x and after x
The breakpoint is then the position x with the biggest squared Short_UTR_abun value.
It can probably be improved, and I'd be very happy to hear any suggestion, but it gives already convincing results (I can't share all the results yet sadly):
The original DaPars found (see image below):
Gene
fit_value
Predicted_Proximal_APA
Loci
Red
Green
ENST00000361866.8|COL6A1|chr21|+
1298.1
46003542
chr21:46003391-46005048
1.00
1.00
And the LR method:
Gene
fit_value
Predicted_Proximal_APA
Loci
Red
Green
ENST00000361866.8|COL6A1|chr21|+
1764.4
46004101
chr21:46003391-46005048
0.90
0.71
Which is much closer to the data.
Currently, DaPars2_Two_Samples_Multi_Chr_LR.py only works for 2 samples as I wanted to compute the Fisher-exact P-value, but it can be easily changed for a multi-samples version without Fisher test.
I also added a merge_Dapars.py script to merge results from all chromosomes, perform BH FDR correction, compute DPUI and keep the site with highest DPUI per gene, but it's only accessory.
Dear Dapars2 developers,
I work with long-reads (LR) scRNA data and wanted to compare 3'UTR between cell types. I gave DaPars2 a try, however I realised that the breakpoint detection does not work properly for LR as stated in #22 .
I then used
DaPars2_Multi_Sample_Multi_Chr.py
and adapted it for LR (DaPars2_Two_Samples_Multi_Chr_LR.py
).Briefly, I first look at the last covered point in the UTR (currently min_cov = 10), and define the search region to be between UTR start and last covered point (
last_cov
). I then look for the breakpoint in the same fashion as DaPars, for each positionx
, I :Long_UTR_abun
as the mean of the coverage 50bp beforelast_cov
Short_UTR_abun
as the difference of the mean of the coverage 50bp beforex
and afterx
The breakpoint is then the position
x
with the biggest squaredShort_UTR_abun
value.It can probably be improved, and I'd be very happy to hear any suggestion, but it gives already convincing results (I can't share all the results yet sadly):
The original DaPars found (see image below):
And the LR method:
Which is much closer to the data.
Currently,
DaPars2_Two_Samples_Multi_Chr_LR.py
only works for 2 samples as I wanted to compute the Fisher-exact P-value, but it can be easily changed for a multi-samples version without Fisher test.I also added a
merge_Dapars.py
script to merge results from all chromosomes, perform BH FDR correction, compute DPUI and keep the site with highest DPUI per gene, but it's only accessory.