aidenlab / juicer

A One-Click System for Analyzing Loop-Resolution Hi-C Experiments
http://aidenlab.org
MIT License
418 stars 183 forks source link

Arrowhead sparsity threshold should change depending on bin size #74

Closed HelenLong1 closed 5 years ago

HelenLong1 commented 6 years ago

Hi,

The sparsity threshold in Arrowhead requires a very high number of read pairs. At 5kb and 10kb bin size (Juicer Arrowhead defaults) this threshold is probably sensible because when it is not met Arrowhead will call very few domains.

However if you use a larger bin size you can find more domains in lower resolution data. Indeed many papers have called contact domains using 45kb bin size in order to utilize much lower resolution data than seems to be required by juicer.

I would therefore like to suggest that you alter the sparsity threshold so that it is different for different bin sizes.

On my dataset with 187,335,364 pairs of reads, I can not run arrowhead at any bin size without using --ignore_sparsity. But my results using the flag are: 5kb: 19 domains 10kb: 665 domains 25kb: 2689 domains 50kb: 1889 domains.

Best wishes,

Helen

nchernia commented 5 years ago

Moving this to Juicebox since that's where the Arrowhead code lives.