ay-lab / mustache

Multi-scale Detection of Chromatin Loops from Hi-C and Micro-C Maps using Scale-Space Representation
MIT License
62 stars 11 forks source link

[bug] analysis of front end of chr skipped at diffrent bin sizes #36

Closed fboellmann closed 2 years ago

fboellmann commented 2 years ago

When I analyze juicer .hic files with vanilla settings at different resolutions, the beginning of the chromosome is not analyzed at 2000 fold resolution distance. 20kb rsolution skips the first 40MB even though it set -d 2000000 for each analysis.

image

/home/frank/tools/mustache/mustache/mustache.py -f $(pwd)/NT/hiccups/NT.inter_1.hic -d 2000000 -r 20000 -norm NONE -pt 0.05 -st 0.8 -sz 1.6 -oc 2 -i 10 -p 6 -o $(pwd)/mustache/NT/NT.NONE.q1.r20k.p05.st08.sz16.oc2.out.tsv -cz /oasis/tscc/scratch/frank/customer_data/references/hg38/hg38.chrom.sizes & /home/frank/tools/mustache/mustache/mustache.py -f $(pwd)/NT/hiccups/NT.inter_1.hic -d 2000000 -r 10000 -norm NONE -pt 0.05 -st 0.8 -sz 1.6 -oc 2 -i 10 -p 6 -o $(pwd)/mustache/NT/NT.NONE.q1.r10k.p05.st08.sz16.oc2.out.tsv -cz /oasis/tscc/scratch/frank/customer_data/references/hg38/hg38.chrom.sizes & /home/frank/tools/mustache/mustache/mustache.py -f $(pwd)/NT/hiccups/NT.inter_1.hic -d 2000000 -r 5000 -norm NONE -pt 0.05 -st 0.8 -sz 1.6 -oc 2 -i 10 -p 6 -o $(pwd)/mustache/NT/NT.NONE.q1.r5k.p05.st08.sz16.oc2.out.tsv -cz /oasis/tscc/scratch/frank/customer_data/references/hg38/hg38.chrom.sizes & /home/frank/tools/mustache/mustache/mustache.py -f $(pwd)/NT/hiccups/NT.inter_1.hic -d 2000000 -r 3000 -norm NONE -pt 0.05 -st 0.8 -sz 1.6 -oc 2 -i 10 -p 6 -o $(pwd)/mustache/NT/NT.NONE.q1.r3k.p05.st08.sz16.oc2.out.tsv -cz /oasis/tscc/scratch/frank/customer_data/references/hg38/hg38.chrom.sizes & /home/frank/tools/mustache/mustache/mustache.py -f $(pwd)/NT/hiccups/NT.inter_1.hic -d 2000000 -r 2000 -norm NONE -pt 0.05 -st 0.8 -sz 1.6 -oc 2 -i 10 -p 6 -o $(pwd)/mustache/NT/NT.NONE.q1.r2k.p05.st08.sz16.oc2.out.tsv -cz /oasis/tscc/scratch/frank/customer_data/references/hg38/hg38.chrom.sizes

ay-lab commented 2 years ago

Hi, any reason you are using -norm NONE? Also, you don't need to use the -cz parameter anymore. Mustache reads the chromosome size directly from the .hic file. Can you share your data with me (abbas@lji.org)?

fboellmann commented 2 years ago

No normalization seem to produce significantly more loop calls at all FDR values, so I stuck with that choice. I also did a rough parameter sweep and plotted the FDR values of the sorted loop calls: image The error seems to present with all the data sets I am using, so it might be related to the unnecessary settings, maybe. I have to repeat the analysis with shareable data. I did figure out a useful approach for combining multiple resolutions with loop overlap and comparative analysis. my direct email is frank at arimagenomics dot com Thanks Frank

roayaei commented 2 years ago

I am not sure if I understand. Normalization is not supposed to produce more loops and is definitely needed. My guess is using the raw counts causing the problem especially that mustache is doing a local normalization on top of that normalization. I would be happy to hear about your way of combining loops. Are you doing an exact overlapping or you are allowing some slack for matching between resolutions?

fboellmann commented 2 years ago

I Started a fresh install of miniconda 3.8. Then I had to install all the dependencies without the mustache environment to make my "base" conda environment compatible with it. I also had to install "conda install -c conda-forge libgcc-ng" before hic-straw and cooler.