CollasLab / edd

Enriched Domain Detector for ChIP-seq data
https://pypi.python.org/pypi/edd
MIT License
16 stars 4 forks source link

Could not find a suitable bin size for Macaque genome #13

Open runningvato opened 5 years ago

runningvato commented 5 years ago

Hello,

I have been using EDD to call large chromatin domains in both mouse and human data, but have run into a problem with samples from the Rhesus Macaque (rheMac8 assembly). In addition to trying to run my analysis using the default settings, I have also tried specifying a gap penalty score, and have also tried lowering the required_fraction_of_informative_bins to 0.98. All of these attempts resulted in: Traceback (most recent call last):

File "/usr/local/bin/edd", line 146, in main(args, config) File "/usr/local/bin/edd", line 60, in main loader.load_single_experiment(args.ip_bam, args.input_bam) File "/usr/local/lib/python2.7/dist-packages/eddlib/experiment.py", line 166, in load_single_experiment self.df = self.__adjust_bin_size_and_get_df(self.exp) File "/usr/local/lib/python2.7/dist-packages/eddlib/experiment.py", line 149, in __adjust_bin_size_and_get_df nib_lim=self.nib_lim) File "/usr/local/lib/python2.7/dist-packages/eddlib/estimate.py", line 30, in bin_size assert bin_size < 100, "Could not find a suitable bin size."

One difference that I notice between the rheMac8 assembly and the mouse and human is that the number of entries in the unalignable regions file is much higher in the rheMac8 (63767 vs 687 in mouse and 819 in human). However, the fraction of genome covered by these reads is approximately the same as in either of the other species.

Do you have any suggestions as to how I might get EDD to calculate to best bin sizes for these data sets?

Thanks!