databio / GenomicDistributions

Calculate and plot distributions of genomic ranges
http://code.databio.org/GenomicDistributions
Other
25 stars 10 forks source link

Distance to TSSs is wrong #173

Closed nsheff closed 2 years ago

nsheff commented 2 years ago

In BEDbase, distance calculations to TSS are still incorrect.

Noticed by @Khoroshevskyi

kkupkova commented 2 years ago

Can you give an example, where it's wrong, since our previous test datasets passed?

nsheff commented 2 years ago

@Khoroshevskyi can you follow up?

khoroshevskyi commented 2 years ago

I am in contact with Kristyna. It looks like distance to TSS values are correct. Absolute distances TSS are huge values. If values are the same after second check we will close the issue.

khoroshevskyi commented 2 years ago

@nsheff @kkupkova . I tryed to calculate mean absolute distance to TSS with rGREAT. Results are similar but not the same (I have checked with different options: oneClosest, twoClosest... ) If you are sure that results are correct, than we can close this issue.

nsheff commented 2 years ago

Ok I have investigated this in depth now. I think you are right that the calculation is (roughly) correct. The issue is that the distribution of distances to TSS is similar to an exponential distribution with a very long tail. We have been taking the mean of this distribution, which is not a good indicator of the distribution because it is greatly skewed by outliers.

I will close this issue and re-raise in bedstat.