biocore-ntnu / epic

(DEPRECATED) epic: diffuse domain ChIP-Seq caller based on SICER
http://bioepic.readthedocs.io
MIT License
31 stars 6 forks source link

Using dm3 makes epic crash #24

Closed topalis closed 7 years ago

topalis commented 8 years ago

Hi , I am very new to epic (I have installed it today) and when I am try to run it with multiple cores I get the following error (Running it at a single core appears unaffected until now - still running) . Please advice

Pantelis Topalis

epic -t ../sorted_H3_APAA.bed -c ../sorted_H3_BiB.bed --number-cores 16 -gn dm3 -w 200 -g 3 -fs 150 -fdr 0.05 -egs 0.72 -sm APAA_BiB_matrix

epic -t ../sorted_H3_APAA.bed -c ../sorted_H3_BiB.bed --number-cores 16 -gn dm3 -w 200 -g 3 -fs 150 -fdr 0.05 -egs 0.72 -sm APAA_BiB_matrix

epic -t ../sorted_H3_APAA.bed -c ../sorted_H3_BiB.bed --number-cores 16 -gn dm3 -w 200 -g 3 -fs 150 -fdr 0.05 -egs 0.72 -sm APAA_BiB_matrix (File: epic, Log level: INFO, Time: Wed, 13 Jul 2016 21:22:54 )

Binning ../sorted_H3_APAA.bed (File: run_epic, Log level: INFO, Time: Wed, 13 Jul 2016 21:22:54 ) Binning chromosomes 2L, 2LHet, 2R, 2RHet, 3L, 3LHet, 3R, 3RHet, 4, M, U, Uextra, X, XHet, YHet (File: count_reads_in_windows, Log level: INFO, Time: Wed, 13 Jul 2016 21:22:54 ) Merging the bins on both strands per chromosome. (File: count_reads_in_windows, Log level: INFO, Time: Wed, 13 Jul 2016 21:23:05 ) Binning ../sorted_H3_BiB.bed (File: run_epic, Log level: INFO, Time: Wed, 13 Jul 2016 21:23:07 ) Binning chromosomes 2L, 2LHet, 2R, 2RHet, 3L, 3LHet, 3R, 3RHet, 4, M, U, Uextra, X, XHet, YHet (File: count_reads_in_windows, Log level: INFO, Time: Wed, 13 Jul 2016 21:23:07 ) Merging the bins on both strands per chromosome. (File: count_reads_in_windows, Log level: INFO, Time: Wed, 13 Jul 2016 21:23:19 ) Merging ChIP and Input data. (File: helper_functions, Log level: INFO, Time: Wed, 13 Jul 2016 21:23:22 ) Traceback (most recent call last): File "/usr/local/bin/epic", line 165, in run_epic(args) File "/usr/local/lib/python2.7/dist-packages/epic/run/run_epic.py", line 42, in run_epic args.number_cores) File "/usr/local/lib/python2.7/dist-packages/epic/utils/helper_functions.py", line 37, in merge_chip_and_input for chip_df, input_df in zip(chip_dfs, input_dfs)) File "/usr/local/lib/python2.7/dist-packages/joblib/parallel.py", line 764, in call self.retrieve() File "/usr/local/lib/python2.7/dist-packages/joblib/parallel.py", line 715, in retrieve raise exception joblib.my_exceptions.JoblibValueError: JoblibValueError


Multiprocessing exception: ........................................................................... /usr/local/bin/epic in () 160 elif not args.effective_genome_length and args.paired_end: 161 logging.info("Using paired end so setting readlength to 100.") 162 args.effective_genome_length = get_effective_genome_length(args.genome, 163 100) 164 --> 165 run_epic(args) 166 167 168 169

........................................................................... /usr/local/lib/python2.7/dist-packages/epic/run/run_epic.py in run_epic(args=Namespace(control=['../sorted_H3_BiB.bed'], effe...tment=['../sorted_H3_APAA.bed'], window_size=200)) 37 38 nb_chip_reads = get_total_number_of_reads(chip_merged_sum) 39 nb_input_reads = get_total_number_of_reads(input_merged_sum) 40 41 merged_dfs = merge_chip_and_input(chip_merged_sum, input_merged_sum, ---> 42 args.number_cores) args.number_cores = 16 43 44 score_threshold, island_enriched_threshold, average_window_readcount = \ 45 compute_background_probabilities(nb_chip_reads, args) 46

........................................................................... /usr/local/lib/python2.7/dist-packages/epic/utils/helper_functions.py in merge_chip_and_input(chip_dfs=[ Chromosome Bin Count 0 c...chr2L 22986200 3

[107160 rows x 3 columns], Chromosome Bin Count 0 chr2LHet 1... chr2LHet 367800 2

[539 rows x 3 columns], Chromosome Bin Count 0 chr... chr2R 21145600 3

[98361 rows x 3 columns], Chromosome Bin Count 0 chr2RHet ...chr2RHet 3277200 2

[5745 rows x 3 columns], Chromosome Bin Count 0 c...chr3L 24537000 1

[113832 rows x 3 columns], Chromosome Bin Count 0 chr3LHet ...chr3LHet 2547800 5

[5234 rows x 3 columns], Chromosome Bin Count 0 c...chr3R 27898600 4

[134035 rows x 3 columns], Chromosome Bin Count 0 chr3RHet ...chr3RHet 2517400 8

[4774 rows x 3 columns], Chromosome Bin Count 0 chr4 ... chr4 1285200 1

[5699 rows x 3 columns], Chromosome Bin Count 0 chrM 1800 ... chrM 8800 2 3 chrM 12000 1, Chromosome Bin Count 0 chrU... chrU 10047200 1

[8182 rows x 3 columns], Chromosome Bin Count 0 chrUextra...rUextra 29003800 1

[4023 rows x 3 columns], Chromosome Bin Count 0 ... chrX 22422000 17

[102257 rows x 3 columns], Chromosome Bin Count 0 chrXHet ... chrXHet 190000 3

[522 rows x 3 columns], Chromosome Bin Count 0 chrYHet ... chrYHet 341400 2

[471 rows x 3 columns]], input_dfs=[ Chromosome Bin Count 0 c...chr2L 22997600 1

[107540 rows x 3 columns], Chromosome Bin Count 0 chr2LHet 1... chr2LHet 367800 2

[577 rows x 3 columns], Chromosome Bin Count 0 chr... chr2R 21146200 1

[98668 rows x 3 columns], Chromosome Bin Count 0 chr2RHet ...chr2RHet 3277200 2

[5981 rows x 3 columns], Chromosome Bin Count 0 c...chr3L 24535600 18

[114448 rows x 3 columns], Chromosome Bin Count 0 chr3LHet ...chr3LHet 2547800 4

[5631 rows x 3 columns], Chromosome Bin Count 0 c...chr3R 27898600 4

[134479 rows x 3 columns], Chromosome Bin Count 0 chr3RHet ...chr3RHet 2517400 6

[5181 rows x 3 columns], Chromosome Bin Count 0 chr4 ... chr4 1318000 1

[5701 rows x 3 columns], Chromosome Bin Count 0 chrM 600 ... chrM 12000 2 4 chrM 12200 1, Chromosome Bin Count 0 ch... chrU 10043000 2

[10082 rows x 3 columns], Chromosome Bin Count 0 chrUextra...rUextra 29000400 1

[5706 rows x 3 columns], Chromosome Bin Count 0 ... chrX 22422200 3

[103662 rows x 3 columns], Chromosome Bin Count 0 chrXHet ... chrXHet 197200 1

[544 rows x 3 columns], Chromosome Bin Count 0 chrYHet ... chrYHet 341400 2

[574 rows x 3 columns]], nb_cpu=16) 32 assert len(chip_dfs) == len(input_dfs) 33 34 logging.info("Merging ChIP and Input data.") 35 merged_chromosome_dfs = Parallel(n_jobs=nb_cpu)( 36 delayed(_merge_chip_and_input)(chip_df, input_df) ---> 37 for chip_df, input_df in zip(chip_dfs, input_dfs)) chip_dfs = [ Chromosome Bin Count 0 c...chr2L 22986200 3

[107160 rows x 3 columns], Chromosome Bin Count 0 chr2LHet 1... chr2LHet 367800 2

[539 rows x 3 columns], Chromosome Bin Count 0 chr... chr2R 21145600 3

[98361 rows x 3 columns], Chromosome Bin Count 0 chr2RHet ...chr2RHet 3277200 2

[5745 rows x 3 columns], Chromosome Bin Count 0 c...chr3L 24537000 1

[113832 rows x 3 columns], Chromosome Bin Count 0 chr3LHet ...chr3LHet 2547800 5

[5234 rows x 3 columns], Chromosome Bin Count 0 c...chr3R 27898600 4

[134035 rows x 3 columns], Chromosome Bin Count 0 chr3RHet ...chr3RHet 2517400 8

[4774 rows x 3 columns], Chromosome Bin Count 0 chr4 ... chr4 1285200 1

[5699 rows x 3 columns], Chromosome Bin Count 0 chrM 1800 ... chrM 8800 2 3 chrM 12000 1, Chromosome Bin Count 0 chrU... chrU 10047200 1

[8182 rows x 3 columns], Chromosome Bin Count 0 chrUextra...rUextra 29003800 1

[4023 rows x 3 columns], Chromosome Bin Count 0 ... chrX 22422000 17

[102257 rows x 3 columns], Chromosome Bin Count 0 chrXHet ... chrXHet 190000 3

[522 rows x 3 columns], Chromosome Bin Count 0 chrYHet ... chrYHet 341400 2

[471 rows x 3 columns]] input_dfs = [ Chromosome Bin Count 0 c...chr2L 22997600 1

[107540 rows x 3 columns], Chromosome Bin Count 0 chr2LHet 1... chr2LHet 367800 2

[577 rows x 3 columns], Chromosome Bin Count 0 chr... chr2R 21146200 1

[98668 rows x 3 columns], Chromosome Bin Count 0 chr2RHet ...chr2RHet 3277200 2

[5981 rows x 3 columns], Chromosome Bin Count 0 c...chr3L 24535600 18

[114448 rows x 3 columns], Chromosome Bin Count 0 chr3LHet ...chr3LHet 2547800 4

[5631 rows x 3 columns], Chromosome Bin Count 0 c...chr3R 27898600 4

[134479 rows x 3 columns], Chromosome Bin Count 0 chr3RHet ...chr3RHet 2517400 6

[5181 rows x 3 columns], Chromosome Bin Count 0 chr4 ... chr4 1318000 1

[5701 rows x 3 columns], Chromosome Bin Count 0 chrM 600 ... chrM 12000 2 4 chrM 12200 1, Chromosome Bin Count 0 ch... chrU 10043000 2

[10082 rows x 3 columns], Chromosome Bin Count 0 chrUextra...rUextra 29000400 1

[5706 rows x 3 columns], Chromosome Bin Count 0 ... chrX 22422200 3

[103662 rows x 3 columns], Chromosome Bin Count 0 chrXHet ... chrXHet 197200 1

[544 rows x 3 columns], Chromosome Bin Count 0 chrYHet ... chrYHet 341400 2

[574 rows x 3 columns]] 38 return merged_chromosome_dfs 39 40 41 def get_total_number_of_reads(dfs):

........................................................................... /usr/local/lib/python2.7/dist-packages/joblib/parallel.py in call(self=Parallel(n_jobs=16), iterable=<generator object >) 759 if pre_dispatch == "all" or n_jobs == 1: 760 # The iterable was consumed all at once by the above for loop. 761 # No need to wait for async callbacks to trigger to 762 # consumption. 763 self._iterating = False --> 764 self.retrieve() self.retrieve = <bound method Parallel.retrieve of Parallel(n_jobs=16)> 765 # Make sure that we get a last message telling us we are done 766 elapsed_time = time.time() - self._start_time 767 self._print('Done %3i out of %3i | elapsed: %s finished', 768 (len(self._output), len(self._output),


Sub-process traceback:

ValueError Wed Jul 13 21:23:22 2016 PID: 33319 Python 2.7.11+: /usr/bin/python ........................................................................... /usr/local/lib/python2.7/dist-packages/joblib/parallel.py in call(self=) 122 def init(self, iterator_slice): 123 self.items = list(iterator_slice) 124 self._size = len(self.items) 125 126 def call(self): --> 127 return [func(_args, *_kwargs) for func, args, kwargs in self.items] func = args = ( Chromosome Bin Count 0 c...chr3R 27898600 4

[134035 rows x 3 columns], Chromosome Bin Count 0 c...chr3R 27898600 4

[134479 rows x 3 columns]) kwargs = {} self.items = [(, ( Chromosome Bin Count 0 c...chr3R 27898600 4

[134035 rows x 3 columns], Chromosome Bin Count 0 c...chr3R 27898600 4

[134479 rows x 3 columns]), {})] 128 129 def len(self): 130 return self._size 131

........................................................................... /usr/local/lib/python2.7/dist-packages/epic/utils/helper_functions.py in _merge_chip_and_input(chip_df= Chromosome Bin Count 0 c...chr3R 27898600 4

[134035 rows x 3 columns], input_df= Chromosome Bin Count 0 c...chr3R 27898600 4

[134479 rows x 3 columns]) 13 14 chip_df_nb_bins = len(chip_df) 15 merged_df = chip_df.merge(input_df, 16 how="left", 17 on=["Chromosome", "Bin"], ---> 18 suffixes=[" ChIP", " Input"]) 19 merged_df = merged_df[["Chromosome", "Bin", "Count ChIP", "Count Input"]] 20 merged_df.columns = ["Chromosome", "Bin", "ChIP", "Input"] 21 22 merged_df = merged_df.fillna(0)

........................................................................... /usr/local/lib/python2.7/dist-packages/pandas-0.18.1-py2.7-linux-x86_64.egg/pandas/core/frame.py in merge(self= Chromosome Bin Count 0 c...chr3R 27898600 4

[134035 rows x 3 columns], right= Chromosome Bin Count 0 c...chr3R 27898600 4

[134479 rows x 3 columns], how='left', on=['Chromosome', 'Bin'], left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=[' ChIP', ' Input'], copy=True, indicator=False) 4432 suffixes=('_x', '_y'), copy=True, indicator=False): 4433 from pandas.tools.merge import merge 4434 return merge(self, right, how=how, on=on, left_on=left_on, 4435 right_on=right_on, left_index=left_index, 4436 right_index=right_index, sort=sort, suffixes=suffixes, -> 4437 copy=copy, indicator=indicator) copy = True indicator = False 4438 4439 def round(self, decimals=0, _args, *_kwargs): 4440 """ 4441 Round a DataFrame to a variable number of decimal places.

........................................................................... /usr/local/lib/python2.7/dist-packages/pandas-0.18.1-py2.7-linux-x86_64.egg/pandas/tools/merge.py in merge(left= Chromosome Bin Count 0 c...chr3R 27898600 4

[134035 rows x 3 columns], right= Chromosome Bin Count 0 c...chr3R 27898600 4

[134479 rows x 3 columns], how='left', on=['Chromosome', 'Bin'], left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=[' ChIP', ' Input'], copy=True, indicator=False) 34 suffixes=('_x', '_y'), copy=True, indicator=False): 35 op = _MergeOperation(left, right, how=how, on=on, left_on=left_on, 36 right_on=right_on, left_index=left_index, 37 right_index=right_index, sort=sort, suffixes=suffixes, 38 copy=copy, indicator=indicator) ---> 39 return op.get_result() op.get_result = <bound method _MergeOperation.get_result of > 40 if debug: 41 merge.doc = _merge_doc % '\nleft : DataFrame' 42 43

........................................................................... /usr/local/lib/python2.7/dist-packages/pandas-0.18.1-py2.7-linux-x86_64.egg/pandas/tools/merge.py in get_result(self=) 212 def get_result(self): 213 if self.indicator: 214 self.left, self.right = self._indicator_pre_merge( 215 self.left, self.right) 216 --> 217 join_index, left_indexer, right_indexer = self._get_join_info() join_index = undefined left_indexer = undefined right_indexer = undefined self._get_join_info = <bound method _MergeOperation._get_join_info of > 218 219 ldata, rdata = self.left._data, self.right._data 220 lsuf, rsuf = self.suffixes 221

........................................................................... /usr/local/lib/python2.7/dist-packages/pandas-0.18.1-py2.7-linux-x86_64.egg/pandas/tools/merge.py in _get_join_info(self=) 348 sort=self.sort) 349 else: 350 (left_indexer, 351 right_indexer) = _get_join_indexers(self.left_join_keys, 352 self.right_join_keys, --> 353 sort=self.sort, how=self.how) self.sort = False self.how = 'left' 354 if self.right_index: 355 if len(self.left) > 0: 356 join_index = self.left.index.take(left_indexer) 357 else:

........................................................................... /usr/local/lib/python2.7/dist-packages/pandas-0.18.1-py2.7-linux-x86_64.egg/pandas/tools/merge.py in _get_join_indexers(left_keys=[array(['chr3R', 'chr3R', 'chr3R', ..., 'chr3R', 'chr3R', 'chr3R'], dtype=object), memmap([ 0, 200, 400, ..., 27897200, 27898400, 27898600])], right_keys=[array(['chr3R', 'chr3R', 'chr3R', ..., 'chr3R', 'chr3R', 'chr3R'], dtype=object), memmap([ 0, 200, 400, ..., 27897200, 27898400, 27898600])], sort=False, how='left') 541 542 # bind sort arg. of _factorize_keys 543 fkeys = partial(_factorize_keys, sort=sort) 544 545 # get left & right join labels and num. of levels at each location --> 546 llab, rlab, shape = map(list, zip(* map(fkeys, left_keys, right_keys))) llab = undefined rlab = undefined shape = undefined fkeys = left_keys = [array(['chr3R', 'chr3R', 'chr3R', ..., 'chr3R', 'chr3R', 'chr3R'], dtype=object), memmap([ 0, 200, 400, ..., 27897200, 27898400, 27898600])] right_keys = [array(['chr3R', 'chr3R', 'chr3R', ..., 'chr3R', 'chr3R', 'chr3R'], dtype=object), memmap([ 0, 200, 400, ..., 27897200, 27898400, 27898600])] 547 548 # get flat i8 keys from label lists 549 lkey, rkey = _get_join_keys(llab, rlab, shape, sort) 550

........................................................................... /usr/local/lib/python2.7/dist-packages/pandas-0.18.1-py2.7-linux-x86_64.egg/pandas/tools/merge.py in _factorize_keys(lk=memmap([ 0, 200, 400, ..., 27897200, 27898400, 27898600]), rk=memmap([ 0, 200, 400, ..., 27897200, 27898400, 27898600]), sort=False) 708 lk = com._ensure_object(lk) 709 rk = com._ensure_object(rk) 710 711 rizer = klass(max(len(lk), len(rk))) 712 --> 713 llab = rizer.factorize(lk) llab = undefined rizer.factorize = lk = memmap([ 0, 200, 400, ..., 27897200, 27898400, 27898600]) 714 rlab = rizer.factorize(rk) 715 716 count = rizer.get_count() 717

........................................................................... /usr/local/lib/python2.7/dist-packages/pandas-0.18.1-py2.7-linux-x86_64.egg/pandas/hashtable.so in pandas.hashtable.Int64Factorizer.factorize (pandas/hashtable.c:15827)() 854 855 856 857 858 --> 859 860 861 862 863

........................................................................... /usr/local/lib/python2.7/dist-packages/pandas-0.18.1-py2.7-linux-x86_64.egg/pandas/hashtable.so in View.MemoryView.memoryview_cwrapper (pandas/hashtable.c:29882)() 611 612 613 614 615 --> 616 617 618 619 620

........................................................................... /usr/local/lib/python2.7/dist-packages/pandas-0.18.1-py2.7-linux-x86_64.egg/pandas/hashtable.so in View.MemoryView.memoryview.cinit (pandas/hashtable.c:26251)() 318 319 320 321 322 --> 323 324 325 326 327

ValueError: buffer source array is read-only

topalis commented 8 years ago

Digging a bit deeper : Using the default genome (hg18) the run finishes without errors (same input files/parameters). When I try to use Drosophila (dm3) the error appears again.

endrebak commented 8 years ago

Thanks for reporting. Will look at it when back from vacation in two weeks.

If you want to try fixing it yourself, you can try rerunning the Snakefile in the scripts directory for dm3 (if you do not know snakemake this is not an option.) There were some issues with this pr: https://github.com/endrebak/epic/pull/3

endrebak commented 8 years ago

Can you try updating joblib? pip install -U joblib. Or pandas: pip install -U pandas? May take some time.

Or try running the script once more choosing dm3, but only using one core? Will make the error messages easier to read/interpret.

endrebak commented 8 years ago

If you could upload the files and send me a dropbox link or something, it would make it much easier for me to debug. Endrebak85@gmail.com

endrebak commented 8 years ago

What are the chromosomes in your files called?

cut -f 1 ../sorted_H3_APAA.bed | sort | uniq -c

and

cut -f 1 ../sorted_H3_BiB.bed | sort | uniq -c