4dn-dcic / hic2cool

Lightweight converter between hic and cool contact matrices.
MIT License
66 stars 7 forks source link

zlib.error when converting hic to cool #49

Open dejunlin opened 4 years ago

dejunlin commented 4 years ago

Hi, this is Dejun from the Noble lab. When I run

hic2cool convert test.hic test.cool

on one my .hic file, I got this error:

##########################
### hic2cool / convert ###
##########################
### Header info from hic
... Chromosomes:  ['ALL', 'I', 'II', 'III', 'IV', 'IX', 'V', 'VI', 'VII', 'VIII', 'X', 'XI', 'XII', 'XIII', 'XIV', 'XV', 'XVI', 'M']
... Resolutions:  [500]
... Normalizations:  ['GW_SCALE', 'VC', 'VC_SQRT', 'KR', 'SCALE']
... Genome:  sacCer3
### Converting
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/dlin2/.pyenv/versions/3.8.2/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/dlin2/.pyenv/versions/3.8.2/envs/hic2cool/lib/python3.8/site-packages/hic2cool/hic2cool_utils.py", line 379, in build_counts_chunk
    records = read_block(req, block_record)
  File "/home/dlin2/.pyenv/versions/3.8.2/envs/hic2cool/lib/python3.8/site-packages/hic2cool/hic2cool_utils.py", line 232, in read_block
    uncompressedBytes = zlib.decompress(compressedBytes)
zlib.error: Error -5 while decompressing data: incomplete or truncated stream
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/dlin2/.pyenv/versions/hic2cool/bin/hic2cool", line 10, in <module>
    sys.exit(main())
  File "/home/dlin2/.pyenv/versions/3.8.2/envs/hic2cool/lib/python3.8/site-packages/hic2cool/__main__.py", line 86, in main
    hic2cool_convert(args.infile, args.outfile, args.resolution, args.nproc, args.warnings, args.silent)
  File "/home/dlin2/.pyenv/versions/3.8.2/envs/hic2cool/lib/python3.8/site-packages/hic2cool/hic2cool_utils.py", line 938, in hic2cool_convert
    tmp_chunk = parse_hic(req, pool, nproc, chr_key, unit, binsize,
  File "/home/dlin2/.pyenv/versions/3.8.2/envs/hic2cool/lib/python3.8/site-packages/hic2cool/hic2cool_utils.py", line 361, in parse_hic
    result_all.extend(mpi_result[mpi].get())
  File "/home/dlin2/.pyenv/versions/3.8.2/lib/python3.8/multiprocessing/pool.py", line 768, in get
    raise self._value
zlib.error: Error -5 while decompressing data: incomplete or truncated stream

at which point the process is stuck and won't exit unless I manually ctrl+c. I verified that the file test.hic can be opened in Juicebox and everything looks fine.

I am using hic2cool version 0.8.2 and python 3.8.2. Can anyone take a look? Here is the test.hic file that gave the error: https://drive.google.com/file/d/1zT1-R6i79hvbRK0uU-6bglILUcvCnTZ4/view?usp=sharing

SooLee commented 4 years ago

Can you try 0.8.3?

dejunlin commented 4 years ago

0.8.3 works with the serial processing. But adding for example -p 4 will choke it:

##########################
### hic2cool / convert ###
##########################
### Header info from hic
... Chromosomes:  ['ALL', 'I', 'II', 'III', 'IV', 'IX', 'V', 'VI', 'VII', 'VIII', 'X', 'XI', 'XII', 'XIII', 'XIV', 'XV', 'XVI', 'M']
... Resolutions:  [500]
... Normalizations:  ['GW_SCALE', 'VC', 'VC_SQRT', 'KR', 'SCALE']
... Genome:  sacCer3
### Converting
Traceback (most recent call last):
  File "/home/dlin2/.pyenv/versions/hic2cool/bin/hic2cool", line 8, in <module>
    sys.exit(main())
  File "/home/dlin2/.pyenv/versions/3.8.2/envs/hic2cool/lib/python3.8/site-packages/hic2cool/__main__.py", line 86, in main
    hic2cool_convert(args.infile, args.outfile, args.resolution, args.nproc, args.warnings, args.silent)
  File "/home/dlin2/.pyenv/versions/3.8.2/envs/hic2cool/lib/python3.8/site-packages/hic2cool/hic2cool_utils.py", line 936, in hic2cool_convert
    tmp_chunk = parse_hic(req, pool, nproc, chr_key, unit, binsize,
  File "/home/dlin2/.pyenv/versions/3.8.2/envs/hic2cool/lib/python3.8/site-packages/hic2cool/hic2cool_utils.py", line 362, in parse_hic
    result_all = build_counts_chunk(0, c1, c2, block_info, chr_offset_map, region_indices)
  File "/home/dlin2/.pyenv/versions/3.8.2/envs/hic2cool/lib/python3.8/site-packages/hic2cool/hic2cool_utils.py", line 377, in build_counts_chunk
    records = read_block(req, block_record)
  File "/home/dlin2/.pyenv/versions/3.8.2/envs/hic2cool/lib/python3.8/site-packages/hic2cool/hic2cool_utils.py", line 232, in read_block
    uncompressedBytes = zlib.decompress(compressedBytes)
zlib.error: Error -3 while decompressing data: invalid distance too far back
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/home/dlin2/.pyenv/versions/3.8.2/lib/python3.8/multiprocessing/util.py", line 277, in _run_finalizers
    finalizer()
  File "/home/dlin2/.pyenv/versions/3.8.2/lib/python3.8/multiprocessing/util.py", line 201, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/home/dlin2/.pyenv/versions/3.8.2/lib/python3.8/multiprocessing/pool.py", line 689, in _terminate_pool
    cls._help_stuff_finish(inqueue, task_handler, len(pool))
  File "/home/dlin2/.pyenv/versions/3.8.2/lib/python3.8/multiprocessing/pool.py", line 674, in _help_stuff_finish
    inqueue._rlock.acquire()

Anyway, the serial version is good enough for now although it would be nice to have the parallel version too.

mdozmorov commented 4 years ago

Reporting exactly identical zlib.error: Error -5 while decompressing data: incomplete or truncated stream with -p 4. Processing on a single core resolves the error.