deeptools / HiCExplorer

HiCExplorer is a powerful and easy to use set of tools to process, normalize and visualize Hi-C data.
https://hicexplorer.readthedocs.org
GNU General Public License v3.0
233 stars 70 forks source link

hicexport to .cool #258

Closed LeilyR closed 6 years ago

LeilyR commented 6 years ago

Hi, while exporting .h5 matrix to .cool matrix from fly genome noticed that .cool format has difficulty accepting chromosome names since they are combination of numbers and letters and the word 'chr' is not mentioned in h5 format and i think it is not being added while changing the format to .cool. So, technically the cool matrix generated by hicexport cannot be used by any of the cooler tools. For the first issue I don't know if you guys can change anything but for the later one I assume it will be possible to change the name to chr+number rather than number only. Also, did any of you by any chance use cooler before on fly genome that can tell me how to overcome this issue?

joachimwolff commented 6 years ago

Hi Leily,

as far as I know we use to build the matrix the chromosome names which are provided by the BAM files and therefore by the reference genome you used for mapping. 'chr' is only included in the naming if you use UCSC reference genomes but you at MPI use ensembl. The conversion is not trivial e.g. https://github.com/dpryan79/ChromosomeMappings/blob/master/GRCh37_UCSC2ensembl.txt these are 90+ lines of conversion needed. And yes, we do not convert it when we export from h5 to cool, we just take what is there. I think the issue is not that other can't use it, I think it is more that other do expect UCSC annotation but you provide ensembl.

Best, Joachim

vivekbhr commented 6 years ago

@LeilyR to sort out whether the issue was UCSC vs ensembl or drosophila vs human, could you try using the UCSC version of the drosophila annotations ?

LeilyR commented 6 years ago

It has to do with both. The ucsc vs ensemble issue was easy for me to resolve but the drosophila vs human is not, and I dont know if you can do much for it, maybe the ".cool" people should change their code since it is written very specifically when it gets to the chr names.

joachimwolff commented 6 years ago

Is this issue still existing?

LeilyR commented 6 years ago

I am not using the .cool output of hicexplorer as an input for cooltools anymore but I assume if one wants to use it has to modify cooltools code to make it work first on flies chromosome arms and second on the Ensembl format vs UCSC format.