4dn-dcic / hic2cool

Lightweight converter between hic and cool contact matrices.
MIT License
66 stars 7 forks source link

Preserve chromosome names from hic #34

Closed carlvitzthum closed 5 years ago

carlvitzthum commented 5 years ago

Issue raised via email:

I am using your hic2cool pipeline and encountering an issue. I get the .cool file, but the chromosome names are different. My chr_size.txt file I used for generating hic file has names like chr1, chr2 and so on, but the cool file has names 1,2,3,.. etc. The 'chr' prefix from the name is missing. Can you suggest a way to preserve the original chromosome name in this conversion?

Will try to get OP to respond to this issue and investigate soon.

carlvitzthum commented 5 years ago

I am having trouble reproducing this issue with hic2cool 0.7.2. Using the test hic file in this repo (test_data/test_hic.hic) I find the following chromosome names from the hic file when running hic2cool convert:

... Chromosomes:  ['All', 'chr1', 'chr2', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chr20', 'chr21', 'chr22', 'chrX', 'chrY', 'chrMT']

Then, opening a cooler using one of the resolutions from the resulting multi-res cooler seems to preserve chromosome names correctly:

>>> import cooler
>>> cool = cooler.Cooler('path_to_test_mcool.mcool::resolutions/5000')
>>> cool.chromnames
['chr1', 'chr2', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chr20', 'chr21', 'chr22', 'chrX', 'chrY', 'chrMT']
carlvitzthum commented 5 years ago

The issue was identified as the chromosome names in the input hic file. hic2cool was working as expected