Closed iromeo closed 6 years ago
Initial impl: just using bedtools multi intersection regions.
Minuses: there are false positives because peaks could be slightly shifted
Pluses: Could detect peak width change
Weak consensus (>=2 isn't a silver bullet even on H3K4me1). Actually here we could use smth intermediate between median and weak consensus.
Still lot's of diff in minor peaks or it's parts:
E.g. H3K4me1, consensus >= 6
$ bash /mnt/stripe/washu/bed/consensus_diffscore.sh /mnt/stripe/bio/experiments/configs/benchmark/benchmark/H3K4me1/zinbra/clean > /mnt/stripe/bio/experiments/configs/benchmark/benchmark/H3K4me1/zinbra_cons_diffscore_c6.bed 6
$ cat /mnt/stripe/bio/experiments/configs/benchmark/benchmark/H3K4me1/zinbra_cons_diffscore_c6.bed | awk '{ print $0, $3-$2 }' | awk '{ if ($NF >= 6 && ( $5 <= 0 || $6 <= 0) && $8 > 200 ) print $0}' | sort -k7,7nr -k5,5nr |
Top peaks where one age group doesn't intersect consensus peak:
Top scores with <=2 exceptions in other group:
cat /mnt/stripe/bio/experiments/configs/benchmark/benchmark/H3K4me1/zinbra_cons_diffscore_c10.bed | awk '{ print $0, $3-$2 }' | awk '{ if ($NF >= 6 && ( $5 <= 2 || $6 <= 2) && $8 > 200 ) print $0}' | sort -k7,7nr -k5,5nr | less
chr10 38321600 38322000 12 11 1 10 400
chr1 27097800 27098200 13 11 2 9 400
chr3 107632400 107632800 13 11 2 9 400
chr4 83563200 83563600 13 11 2 9 400
chr11 72953400 72953800 12 10 2 8 400
chr13 34252800 34253800 12 10 2 8 1000
chr21 36671000 36671600 11 10 2 8 600
chrX 68247600 68248000 10 9 2 7 400
chr10 2967400 2968000 10 8 2 6 600
chr13 113457400 113457800 10 8 2 6 400
chr22 38336600 38337000 10 8 2 6 400
chr3 188621200 188621800 10 8 2 6 600
chr4 184390800 184391600 10 8 2 6 800
chr6 89737400 89737800 10 8 2 6 400
chr8 22238200 22238600 10 8 2 6 400
chrX 70711800 70712200 10 8 2 6 400
chrX 124338000 124338400 10 2 8 6 400
chrY 4281600 4282000 10 2 8 6 400
chr10 38321600 38322000 12 11 1 10 400
chr1 27097800 27098200 13 11 2 9 400
Seems nothing useful. We can try recompute this for "together fit" chromosomes instead of split. I don't believe, that it could really change the result.
It looks like this issue can be closed, claiming that no significant results found.
My last point was to recompute this scores for the last benchmark results
Since it is not recomputed on latest peak calling results, reopening.
Related issue, Fisher Test for peaks difference - https://github.com/JetBrains-Research/washu/issues/59
New peak calling results were checked in #59, closing.
Let's range all consensus peaks using metrics: for each consensus region 'r':
#{OD intersecting 'r'} - #{YD intersecting 'r'}