4dn-dcic / hic2cool

Lightweight converter between hic and cool contact matrices.
MIT License
66 stars 7 forks source link

Encoding problem while converting a Juicer matrix? #56

Open chorzow opened 2 years ago

chorzow commented 2 years ago

Hello everyone,

I am trying to convert a .hic matrix that had been created with Juicer Tools's Pre. I use Juicer Tools v2.12.00. Here is the command I use for conversion:

hic2cool convert contact_map.hic contact_map.cool -r 10000

However, I am getting this error:

Traceback (most recent call last): File "/home/user/.conda/envs/hic2cool/bin/hic2cool", line 33, in <module> sys.exit(load_entry_point('hic2cool==0.8.3', 'console_scripts', 'hic2cool')()) File "/home/user/.conda/envs/hic2cool/lib/python3.9/site-packages/hic2cool-0.8.3-py3.9.egg/hic2cool/__main__.py", line 86, in main File "/home/user/.conda/envs/hic2cool/lib/python3.9/site-packages/hic2cool-0.8.3-py3.9.egg/hic2cool/hic2cool_utils.py", line 859, in hic2cool_convert File "/home/user/.conda/envs/hic2cool/lib/python3.9/site-packages/hic2cool-0.8.3-py3.9.egg/hic2cool/hic2cool_utils.py", line 96, in read_header File "/home/user/.conda/envs/hic2cool/lib/python3.9/site-packages/hic2cool-0.8.3-py3.9.egg/hic2cool/hic2cool_utils.py", line 59, in readcstr UnicodeDecodeError: 'utf-8' codec can't decode byte 0xee in position 0: invalid continuation byte

I believe the problem is with the encoding. Can someone help me solve it?

AssumeAssume commented 2 years ago

Hello, @chorzow , you are not alone. I've compared the results between Juicer Tools version 2 and version 1, their .hic file headers appear different, which I guess cause this issue. I'm still looking for a solution.

kalavattam commented 2 years ago

Any follow up on this?

chorzow commented 2 years ago

@kalavattam no updates from the developers yet. However, I managed to convert new-format .hic fies to .cool via additional conversion step. Cooler files require two objects to be created from: bins, determining the "layout" of the resulting file, and pixels with the contacts themselves. For now, I convert my .hic files to .cool with the following procedure:

  1. Determine how bins object will be organized and obtain it (either hardcode it or take from existing .cool file of the same organism)
  2. Dump a matrix with the latest juicer_tools.jar and consider it as a pixels object;
  3. Create a .cool file via cooler.create_cooler() from your resulting bins and pixels. See cooler API reference for further details.

That might be not the best way to do it but it is working somehow. Another option could be keeping two jarfiles of juicer_tools (v1 and v2) and use them for conversion or displaying in Juicebox depending on what you need.

tonyjhlam commented 2 years ago

@chorzow Can you upload an example of this? I am trying to resolve this issue and cannot reproduce your solution.

nikhilp11 commented 1 year ago

@kalavattam no updates from the developers yet. However, I managed to convert new-format .hic fies to .cool via additional conversion step. Cooler files require two objects to be created from: bins, determining the "layout" of the resulting file, and pixels with the contacts themselves. For now, I convert my .hic files to .cool with the following procedure:

  1. Determine how bins object will be organized and obtain it (either hardcode it or take from existing .cool file of the same organism)
  2. Dump a matrix with the latest juicer_tools.jar and consider it as a pixels object;
  3. Create a .cool file via cooler.create_cooler() from your resulting bins and pixels. See cooler API reference for further details.

That might be not the best way to do it but it is working somehow. Another option could be keeping two jarfiles of juicer_tools (v1 and v2) and use them for conversion or displaying in Juicebox depending on what you need.

How can we convert a v2 .hic file to v1 .hic file? Can we downgrade the hic file

ratheraarif commented 1 year ago

I encountered a similar problem while trying to generate a cool file from hic data. I managed to address this issue by utilizing juicer_tools version 1.22.01 during the hic file creation process. You choose the appropriate version here https://github.com/aidenlab/juicer/wiki/Download