biocore-ntnu / epic

(DEPRECATED) epic: diffuse domain ChIP-Seq caller based on SICER
http://bioepic.readthedocs.io
MIT License
31 stars 6 forks source link

Bug in multiprocessing paired end data: #33

Closed endrebak closed 6 years ago

endrebak commented 8 years ago
epic -pe -t examples/chr19_sample.bedpe   -c examples/chr19_input.bedpe

works, but

epic -cpu 25 -pe -t examples/chr19_sample.bedpe   -c examples/chr19_input.bedpe  --store-matrix H3K27me3.matrix

fails! I'll try to get to the bottom of this, but the error is in a different library, not epic.

# epic -cpu 25 -pe -t examples/chr19_sample.bedpe -c examples/chr19_input.bedpe --store-matrix H3K27me3.matrix
# epic -cpu 25 -pe -t examples/chr19_sample.bedpe -c examples/chr19_input.bedpe --store-matrix H3K27me3.matrix (File: epic, Log level: INFO, Time: Thu, 11 Aug 2016 14:55:40 )
Using paired end so setting readlength to 100. (File: epic, Log level: INFO, Time: Thu, 11 Aug 2016 14:55:41 )
Using an effective genome fraction of 0.901962701202. (File: genomes, Log level: INFO, Time: Thu, 11 Aug 2016 14:55:41 )
Binning examples/chr19_sample.bedpe (File: run_epic, Log level: INFO, Time: Thu, 11 Aug 2016 14:55:41 )
Binning chromosomes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, M, X, Y (File: count_reads_in_windows, Log level: INFO, Time: Thu, 11 Aug 2016 14:55:41 )
Making duplicated bins unique by summing them. (File: count_reads_in_windows, Log level: INFO, Time: Thu, 11 Aug 2016 14:55:48 )
Binning examples/chr19_input.bedpe (File: run_epic, Log level: INFO, Time: Thu, 11 Aug 2016 14:55:48 )
Binning chromosomes 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, M, X, Y (File: count_reads_in_windows, Log level: INFO, Time: Thu, 11 Aug 2016 14:55:48 )
Making duplicated bins unique by summing them. (File: count_reads_in_windows, Log level: INFO, Time: Thu, 11 Aug 2016 14:55:54 )
Merging ChIP and Input data. (File: helper_functions, Log level: INFO, Time: Thu, 11 Aug 2016 14:55:55 )
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/local/home/endrebak/anaconda3/lib/python3.5/site-packages/joblib-0.9.4-py3.5.egg/joblib/parallel.py", line 130, in __call__
    return self.func(*args, **kwargs)
  File "/local/home/endrebak/anaconda3/lib/python3.5/site-packages/joblib-0.9.4-py3.5.egg/joblib/parallel.py", line 72, in __call__
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/local/home/endrebak/anaconda3/lib/python3.5/site-packages/joblib-0.9.4-py3.5.egg/joblib/parallel.py", line 72, in <listcomp>
    return [func(*args, **kwargs) for func, args, kwargs in self.items]
  File "/local/home/endrebak/anaconda3/lib/python3.5/site-packages/bioepic-0.1.8-py3.5.egg/epic/utils/helper_functions.py", line 19, in _merge_chip_and_input
    suffixes=[" ChIP", " Input"])
  File "/local/home/endrebak/anaconda3/lib/python3.5/site-packages/pandas/core/frame.py", line 4437, in merge
    copy=copy, indicator=indicator)
  File "/local/home/endrebak/anaconda3/lib/python3.5/site-packages/pandas/tools/merge.py", line 39, in merge
    return op.get_result()
  File "/local/home/endrebak/anaconda3/lib/python3.5/site-packages/pandas/tools/merge.py", line 217, in get_result
    join_index, left_indexer, right_indexer = self._get_join_info()
  File "/local/home/endrebak/anaconda3/lib/python3.5/site-packages/pandas/tools/merge.py", line 353, in _get_join_info
    sort=self.sort, how=self.how)
  File "/local/home/endrebak/anaconda3/lib/python3.5/site-packages/pandas/tools/merge.py", line 546, in _get_join_indexers
    llab, rlab, shape = map(list, zip(* map(fkeys, left_keys, right_keys)))
  File "/local/home/endrebak/anaconda3/lib/python3.5/site-packages/pandas/tools/merge.py", line 713, in _factorize_keys
    llab = rizer.factorize(lk)
  File "pandas/hashtable.pyx", line 859, in pandas.hashtable.Int64Factorizer.factorize (pandas/hashtable.c:15715)
  File "stringsource", line 644, in View.MemoryView.memoryview_cwrapper (pandas/hashtable.c:29784)
  File "stringsource", line 345, in View.MemoryView.memoryview.__cinit__ (pandas/hashtable.c:26059)
ValueError: buffer source array is read-only

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/local/home/endrebak/anaconda3/lib/python3.5/tokenize.py", line 392, in find_cookie
    line_string = line.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 24: invalid continuation byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/local/home/endrebak/anaconda3/lib/python3.5/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/local/home/endrebak/anaconda3/lib/python3.5/site-packages/joblib-0.9.4-py3.5.egg/joblib/parallel.py", line 139, in __call__
    tb_offset=1)
  File "/local/home/endrebak/anaconda3/lib/python3.5/site-packages/joblib-0.9.4-py3.5.egg/joblib/format_stack.py", line 373, in format_exc
    frames = format_records(records)
  File "/local/home/endrebak/anaconda3/lib/python3.5/site-packages/joblib-0.9.4-py3.5.egg/joblib/format_stack.py", line 274, in format_records
    for token in generate_tokens(linereader):
  File "/local/home/endrebak/anaconda3/lib/python3.5/tokenize.py", line 514, in _tokenize
    line = readline()
  File "/local/home/endrebak/anaconda3/lib/python3.5/site-packages/joblib-0.9.4-py3.5.egg/joblib/format_stack.py", line 265, in linereader
    line = getline(file, lnum[0])
  File "/local/home/endrebak/anaconda3/lib/python3.5/linecache.py", line 16, in getline
    lines = getlines(filename, module_globals)
  File "/local/home/endrebak/anaconda3/lib/python3.5/linecache.py", line 47, in getlines
    return updatecache(filename, module_globals)
  File "/local/home/endrebak/anaconda3/lib/python3.5/linecache.py", line 136, in updatecache
    with tokenize.open(fullname) as fp:
  File "/local/home/endrebak/anaconda3/lib/python3.5/tokenize.py", line 456, in open
    encoding, lines = detect_encoding(buffer.readline)
  File "/local/home/endrebak/anaconda3/lib/python3.5/tokenize.py", line 433, in detect_encoding
    encoding = find_cookie(first)
  File "/local/home/endrebak/anaconda3/lib/python3.5/tokenize.py", line 397, in find_cookie
    raise SyntaxError(msg)
  File "<string>", line None
SyntaxError: invalid or missing encoding declaration for '/local/home/endrebak/anaconda3/lib/python3.5/site-packages/pandas/hashtable.cpython-35m-x86_64-linux-gnu.so'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/local/home/endrebak/anaconda3/bin/epic", line 4, in <module>
    __import__('pkg_resources').run_script('bioepic==0.1.8', 'epic')
  File "/local/home/endrebak/anaconda3/lib/python3.5/site-packages/setuptools-20.7.0-py3.5.egg/pkg_resources/__init__.py", line 719, in run_script
  File "/local/home/endrebak/anaconda3/lib/python3.5/site-packages/setuptools-20.7.0-py3.5.egg/pkg_resources/__init__.py", line 1504, in run_script
  File "/local/home/endrebak/anaconda3/lib/python3.5/site-packages/bioepic-0.1.8-py3.5.egg/EGG-INFO/scripts/epic", line 165, in <module>
    run_epic(args)
  File "/local/home/endrebak/anaconda3/lib/python3.5/site-packages/bioepic-0.1.8-py3.5.egg/epic/run/run_epic.py", line 42, in run_epic
    args.number_cores)
  File "/local/home/endrebak/anaconda3/lib/python3.5/site-packages/bioepic-0.1.8-py3.5.egg/epic/utils/helper_functions.py", line 55, in merge_chip_and_input
    for chip_df, input_df in zip(chip_dfs, input_dfs))
  File "/local/home/endrebak/anaconda3/lib/python3.5/site-packages/joblib-0.9.4-py3.5.egg/joblib/parallel.py", line 810, in __call__
    self.retrieve()
  File "/local/home/endrebak/anaconda3/lib/python3.5/site-packages/joblib-0.9.4-py3.5.egg/joblib/parallel.py", line 727, in retrieve
    self._output.extend(job.get())
  File "/local/home/endrebak/anaconda3/lib/python3.5/multiprocessing/pool.py", line 608, in get
    raise self._value
SyntaxError: invalid or missing encoding declaration for '/local/home/endrebak/anaconda3/lib/python3.5/site-packages/pandas/hashtable.cpython-35m-x86_64-linux-gnu.so
endrebak commented 8 years ago

I can try to fix this by changing the code slightly, but dunno when I'll get the time. If you experience this error you can use a single core for now or make a PR :)

endrebak commented 8 years ago

@balwierz Should be fixed in 0.1.12.

endrebak commented 7 years ago

This happened again for me on Pandas 0.19.2 :/