deeptools / HiCExplorer

HiCExplorer is a powerful and easy to use set of tools to process, normalize and visualize Hi-C data.
https://hicexplorer.readthedocs.org
GNU General Public License v3.0
233 stars 70 forks source link

Error when converting from hic or cool to h5 #347

Closed svm-zhang closed 5 years ago

svm-zhang commented 5 years ago

Hello,

Thanks for HiCExplorer!

I'd like to report one problem I had when trying to convert from hic or cool format to hdf5. Here is what I did:

  1. I have hic files that I'd like to convert to hdf5 and so I can use HiCExplorer to do some downstream analyses. I used command as below:
hicConvertFormat --matrices something.hic --outFileName something.h5 --inputFormat hic --outputFormat h5

The command ran without any errors but did not yield any output or any error messages.

  1. I then tried to first convert to the cool format from which I can still generate the h5 format. I used command as following:
hicConvertFormat --matrices something.hic --outFileName something.cool --inputFormat hic --outputFormat cool

hicConvertFormat --matrices something.mcool --outFileName something.h5 --inputFormat cool --outputFormat h5

The first command worked perfectly fine, while the second one gave me the following error (I dont paste the whole traceback message here):

KeyError: "Unable to open object (object 'chroms' doesn't exist)"

During handling of the above exception, another exception occurred:

AttributeError: module 'cooler' has no attribute 'fileops'

I have looked around and found people had the same problem, and with v2.2.1 this should not happen.

Could you please take a look and let me know what you think? I can get you more info if you need.

Thanks in advance.

Simo

joachimwolff commented 5 years ago

Hi Simo,

Thanks for using HiCExplorer.

With HiCExplorer we support the cool file format natively - there is no need to transform a cool file to h5 to use all HiCExplorer tools. Concerning your observations: A transform from hic to cool is the only operation we support with hic files, a transform to h5 or something else is not possible. However, as soon as you have the cool file we support it, but there is one thing to know: hic files usually contain not a single Hi-C matrix, they support many resolutions and different correction factors. If you apply hic to cool transform without the resolution parameter set to just one value, the output will be a multicooler file (it should be named mcool file, but the library we use names everything cool.) To solve this issue you can run hicInfo to get information which resolutions are stored and which correction factors are available.

I recommend the following:

  1. run hicConvertFormat for each resolution you like to use:ˋhicConvertFormats -m your.hic —inputFormat hic —outputFormat cool -o your-cool-file-10kb.cool -r 10000ˋ.
  2. run hicInfo to see which correction factors are available: ˋhicInfo -m your-cool-file-10kb.coolˋ
  3. Apply the correction factors e.g. ˋhicConvertFormat -m your-cool-file-10kb.cool —inputFormat cool —outputFormat cool -o your-cool-file-10kb-KR.cool —correction_name KRˋ

I decided to use this workflow for HiCExplorer because in the data analysis of Hi-C you usually work with one resolution and one applied correction. To compare many resolutions and different correction methods is done in the beginning and once you have figured out what works best you use this one resolution and not many. Many resolutions are good to use to interactively explore the data with tools like HiGlass. The last reason for this is, it makes the source code less complex, better maintainable.

In case you prefer to have access to a cool matrix stored inside a multicool file, hicInfo will give you the correct paths. You have to define the path to get access to it, paths are usually as follows: ˋhicInfo -m your-multi-cool-file.cool::/resolutions/10000ˋ

I hope it clarifies it a bit how to handle cool files coming from hic files.

Best,

Joachim

svm-zhang commented 5 years ago

Hello Joachim,

Thank you so much for explaining in such a great details. It definitely helped get myself cleared about many questions about these formats. I agree with you that we only need to decide on one resolution and one correction method to do the downstream analyses.

I was able to get the first two commands running without any problems. The third one complained about module 'cooler' has no attribute 'create_cooler' at beginning, but it went away after I upgraded my cooler package from 0.7.11 to 0.8.3. If this were true, perhaps you could update the installation doc saying hicexplorer 2.2.1 works with cooler 0.8.3 (now it says >= 0.7.11).

Thanks again. Simo

joachimwolff commented 5 years ago

Hi Simo,

great to hear it is working now.

Concerning the cooler version, yes you need to have at least 0.8.2 and I am a bit surprised this dependency update was not enforced. In setup.py and requirements we define 0.8.2 and hicmatrix version 7 which enforces 0.8.2. In the conda packages 0.8.2 is defined too.

Can you tell me how do you install HiCExplorer and from where you have the information version 0.7.11 is still allowed?

Best,

Joachim

svm-zhang commented 5 years ago

Hello Joachim,

I did the installation twice, each using a different procedure (I was trying really hard to make sure the error I saw did not come from the installation), which I think that caused the problem. To first answer your second question, I found the information about >=0.7.11 from the [readdoc page] (https://hicexplorer.readthedocs.io/en/latest/content/installation.html).

  1. I started with installing hicexplorer from the source tarball. It popped me with bunch of required packages and I installed each of them individually (also from source). During that time, I installed cooler 0.8.3.

  2. I also tried to install hicexplorer using pip. It pulled down hicexplorer v2.1.1 (from another ticket, this version should fix the problem I had, although its not quite relevant), as well as cooler v0.7.11. I removed my previous installation prior to this second way.

Since by the time I still cannot work through my error, I re-installed v2.2.1 without cooler v0.8.3. And that gave me the error I saw early this morning. I hope I explained myself clearly to you. If I sent out my problem to you earlier, I could have avoided all of this mess :)

Simo

joachimwolff commented 5 years ago

Hi Simo,

we recommend to use conda and the bioconda channel (http://bioconda.github.io/) to install HiCExplorer. Conda is our preferred package manager, conda is able to resolve all our dependencies in an automatic way. With upcoming changes we will have dependencies which pip cannot resolve, but conda can. It is quite likely we will drop the support for pip this year.

I updated the pip version yesterday for Python 2.7 to version 2.2.1. Please let me know if it works and solves the errors you had. Concerning our documentation, I have to apologize, it is sometimes a bit outdated, I always forget to update the dependency list there :( However, the requirements.txt file is usually up-to-data.

Best,

Joachim

svm-zhang commented 5 years ago

Hello Joachim,

Thanks for updating the pip version. It works for me perfectly now. I will use the requirements.txt from now on to track all versions.

I will also try to use bioconda channel in the near future. With my limited experience with conda and/or bioconda, I cant use them without the root privilege. Not sure if this is correct though (perhaps there is a --prefix option). This can be a pain if I want to do something but have to wait for admin. Thats why I always install tools from source :)

Simo

joachimwolff commented 5 years ago

As far as I know you don't need root privileges for conda. Everything is installed in your home directory. Just download miniconda: https://conda.io/en/latest/miniconda.html and add the bioconda and conda-forge channel

svm-zhang commented 5 years ago

You are right. I was able to install miniconda first and then added bioconda. From there, installing hicexplorer is painless. Thanks very much for the help all the way.

Simo

kalavattam commented 5 years ago

In case it's useful for the community, I want to follow up on one comment from Joachim:

2. run hicInfo to see which correction factors are available: ˋhicInfo -m your-cool-file-10kb.coolˋ

The correction factors aren't found in the main output from hicInfo; instead, they're listed as available columns in stderr.

joachimwolff commented 5 years ago

@kalavattam With version 2.2.1 I changed it to normal std out (Or more precise, it is printed out now with print() function instead of log.info()). However, if you have noticed the previous behaviour you probably piped the output to a file. With the same version (2.2.1) I added the option to write the output of hicInfo to a file instead to the bash.

kalavattam commented 5 years ago

Thanks, Joachim.

However, if you have noticed the previous behaviour you probably piped the output to a file.

Yeah, I was doing that.

I have upgraded to version 2.2.1 now.