GangCaoLab / CoolBox

Jupyter notebook based genomic data visualization toolkit.
https://gangcaolab.github.io/CoolBox/index.html
GNU General Public License v3.0
224 stars 37 forks source link

Error in plotting .mcool file #101

Closed hamy12398 closed 1 month ago

hamy12398 commented 2 months ago

When I tried to plot the hic data (.mcool file), frame = XAxis() + HiCMat(f"{DATA_DIR}/4DNFIZL8OZE1.mcool", style='matrix', color_bar='horizontal') frame.plot(TEST_RANGE)

I got the error below: matrix_val_range()] zero-size array to reduction operation minimum which has no identity

I looked inside my mcool file for bin data and contact count, I found this chrom start end KR VC VC_SQRT weight 0 chr1 0 1000 NaN 0.0 0.0 NaN 1 chr1 1000 2000 NaN 0.0 0.0 NaN 2 chr1 2000 3000 NaN 0.0 0.0 NaN 3 chr1 3000 4000 NaN 0.0 0.0 NaN 4 chr1 4000 5000 NaN 0.0 0.0 NaN ... ... ... ... .. ... ... ... 3088276 chrY 57223000 57224000 NaN 0.0 0.0 NaN 3088277 chrY 57224000 57225000 NaN 0.0 0.0 NaN 3088278 chrY 57225000 57226000 NaN 0.0 0.0 NaN 3088279 chrY 57226000 57227000 NaN 0.0 0.0 NaN 3088280 chrY 57227000 57227415 NaN 0.0 0.0 NaN

[3088281 rows x 7 columns] bin1_id bin2_id count 0 66 1054927 1 1 101 2962335 1 2 116 1091640 1 3 125 1235113 1 4 180 1012539 1 ... ... ... ... 450362 3087889 3087922 1 450363 3087895 3087896 1 450364 3087903 3087903 1 450365 3087911 3087911 1 450366 3087938 3087938 1

Can you help/suggest way to fix this error? Thank you

mdozmorov commented 1 month ago

We tested two files:

|-- CoolBox_ex
|   `-- cool_chr9_4000000_6000000.mcool - downloaded from https://github.com/GangCaoLab/CoolBox/blob/master/tests/test_data/cool_chr9_4000000_6000000.mcool
`-- Gm12878
    `-- 4DNFIZL8OZE1.mcool - downloaded from https://data.4dnucleome.org/files-processed/4DNFIZL8OZE1/

The most informative was to compare metadata, with the function below (ChatGPT created). It appears, the CoolBox file is format-version: 2, the 4D Nucleome one is format-version: 3. They are very different. I'm not an expert in the .cool format and have limited Python knowledge. How should we convert 4D Nucleome files to be compatible with CoolBox?

import h5py

# Function to print metadata
def print_metadata(file):
    with h5py.File(file, 'r') as f:
        print(f"File: {file}")
        for key in f.attrs.keys():
            print(f"{key}: {f.attrs[key]}")
        for name, group in f.items():
            print(f"\nGroup: {name}")
            for key in group.attrs.keys():
                print(f"{key}: {group.attrs[key]}")
            for subgroup in group:
                print(f"  Subgroup: {subgroup}")
                for key in group[subgroup].attrs.keys():
                    print(f"    {key}: {group[subgroup].attrs[key]}")
mdozmorov commented 1 month ago

Minimal code to reproduce:

import coolbox
from coolbox.api import *
import os
# Downloaded from https://data.4dnucleome.org/files-processed/4DNFIZL8OZE1/
mcool_file = "/path/to/4DNFIZL8OZE1.mcool"
TEST_RANGE = "chr9:4000000-6000000"
resolution = 10000  # Adjust resolution as needed
frame = XAxis() + HiCMat(mcool_file, style='matrix', color_bar='horizontal')
frame.plot(TEST_RANGE)
zhqu1148980644 commented 1 month ago

I examined your data, and the error may be caused by data issues(blank), which do not affect the plotting. I checked on 4dn and found that it is a blank area. image

mdozmorov commented 1 month ago

Everything works. Apparently, the file we used was modified became empty. I downloaded the file myself and it works. @hamy12398, close the issue.

mdozmorov commented 1 month ago

Thank you all for your help!