Consensus abs difference score

iromeo commented 6 years ago

Let's range all consensus peaks using metrics: for each consensus region 'r': #{OD intersecting 'r'} - #{YD intersecting 'r'}

iromeo commented 6 years ago

Initial impl: just using bedtools multi intersection regions.

Minuses: there are false positives because peaks could be slightly shifted
Pluses: Could detect peak width change

iromeo commented 6 years ago

Weak consensus (>=2 isn't a silver bullet even on H3K4me1). Actually here we could use smth intermediate between median and weak consensus.

iromeo commented 6 years ago

Still lot's of diff in minor peaks or it's parts:

E.g. H3K4me1, consensus >= 6

$ bash /mnt/stripe/washu/bed/consensus_diffscore.sh /mnt/stripe/bio/experiments/configs/benchmark/benchmark/H3K4me1/zinbra/clean > /mnt/stripe/bio/experiments/configs/benchmark/benchmark/H3K4me1/zinbra_cons_diffscore_c6.bed 6
$ cat /mnt/stripe/bio/experiments/configs/benchmark/benchmark/H3K4me1/zinbra_cons_diffscore_c6.bed | awk '{ print $0, $3-$2 }' | awk '{ if ($NF >= 6 && ( $5 <= 0 || $6 <= 0) && $8 > 200 ) print $0}' | sort -k7,7nr  -k5,5nr |

Top peaks where one age group doesn't intersect consensus peak:

iromeo commented 6 years ago

Top scores with <=2 exceptions in other group:

cat /mnt/stripe/bio/experiments/configs/benchmark/benchmark/H3K4me1/zinbra_cons_diffscore_c10.bed | awk '{ print $0, $3-$2 }' | awk '{ if ($NF >= 6 && ( $5 <= 2 || $6 <= 2) && $8 > 200 ) print $0}' | sort -k7,7nr  -k5,5nr | less

chr10   38321600        38322000        12      11      1       10 400
chr1    27097800        27098200        13      11      2       9 400
chr3    107632400       107632800       13      11      2       9 400
chr4    83563200        83563600        13      11      2       9 400
chr11   72953400        72953800        12      10      2       8 400
chr13   34252800        34253800        12      10      2       8 1000
chr21   36671000        36671600        11      10      2       8 600
chrX    68247600        68248000        10      9       2       7 400
chr10   2967400 2968000 10      8       2       6 600
chr13   113457400       113457800       10      8       2       6 400
chr22   38336600        38337000        10      8       2       6 400
chr3    188621200       188621800       10      8       2       6 600
chr4    184390800       184391600       10      8       2       6 800
chr6    89737400        89737800        10      8       2       6 400
chr8    22238200        22238600        10      8       2       6 400
chrX    70711800        70712200        10      8       2       6 400
chrX    124338000       124338400       10      2       8       6 400
chrY    4281600 4282000 10      2       8       6 400

chr10 38321600 38322000 12 11 1 10 400

chr1 27097800 27098200 13 11 2 9 400

iromeo commented 6 years ago

Seems nothing useful. We can try recompute this for "together fit" chromosomes instead of split. I don't believe, that it could really change the result.

olegs commented 6 years ago

It looks like this issue can be closed, claiming that no significant results found.

iromeo commented 6 years ago

My last point was to recompute this scores for the last benchmark results

olegs commented 6 years ago

Since it is not recomputed on latest peak calling results, reopening.

iromeo commented 6 years ago

Related issue, Fisher Test for peaks difference - https://github.com/JetBrains-Research/washu/issues/59

olegs commented 6 years ago

New peak calling results were checked in #59, closing.

JetBrains-Research / washu

Consensus abs difference score #36