4dn-dcic / hic2cool

Lightweight converter between hic and cool contact matrices.
MIT License
66 stars 7 forks source link

wrong chromosome names appear in warning messages #32

Closed balwierz closed 5 years ago

balwierz commented 5 years ago

I am converting a hic file. The genome is mouse (mm9), which has 19 somatic chromosomes, chrX, chrY, chrM and some *_random scaffolds. However I am getting warning messages which make no sense:

!!! WARNING. Normalization vector VC does not exist for chr idx 22.
!!! WARNING. Normalization vector VC_SQRT does not exist for chr idx 22.
!!! WARNING. Normalization vector KR does not exist for chr idx 22.
... The intersection between chr 1 and chr 22 cannot be found in the hic file.

I opened the binary .hic file and looked in the header: the chromosomes are named UCSC style chr1..chr19,chrX,chrY,chrM indeed. I peeked into the code and realised that c1 and c1 (passed as a single "_"-concatenated string!) are probably array indices.

I propose two solutions 1) proper one: report real chromosome names from hic file instead of "chr ". 2) quick fix: spell out "contig number" explicitly instead of "chr" in warning messages to remove confusion.

carlvitzthum commented 5 years ago

Hi @balwierz

Thanks for the helpful feedback. It was an easy addition to report the real chromosome name for that warning, so I added that and released hic2cool 0.7.2. Please let me know if the change does not work for you or other issues come up.

Best, Carl