The sparsity threshold in Arrowhead requires a very high number of read pairs. At 5kb and 10kb bin size (Juicer Arrowhead defaults) this threshold is probably sensible because when it is not met Arrowhead will call very few domains.
However if you use a larger bin size you can find more domains in lower resolution data. Indeed many papers have called contact domains using 45kb bin size in order to utilize much lower resolution data than seems to be required by juicer.
I would therefore like to suggest that you alter the sparsity threshold so that it is different for different bin sizes.
On my dataset with 187,335,364 pairs of reads, I can not run arrowhead at any bin size without using --ignore_sparsity. But my results using the flag are:
5kb: 19 domains
10kb: 665 domains
25kb: 2689 domains
50kb: 1889 domains.
Hi,
The sparsity threshold in Arrowhead requires a very high number of read pairs. At 5kb and 10kb bin size (Juicer Arrowhead defaults) this threshold is probably sensible because when it is not met Arrowhead will call very few domains.
However if you use a larger bin size you can find more domains in lower resolution data. Indeed many papers have called contact domains using 45kb bin size in order to utilize much lower resolution data than seems to be required by juicer.
I would therefore like to suggest that you alter the sparsity threshold so that it is different for different bin sizes.
On my dataset with 187,335,364 pairs of reads, I can not run arrowhead at any bin size without using --ignore_sparsity. But my results using the flag are: 5kb: 19 domains 10kb: 665 domains 25kb: 2689 domains 50kb: 1889 domains.
Best wishes,
Helen