deeptools / HiCExplorer

HiCExplorer is a powerful and easy to use set of tools to process, normalize and visualize Hi-C data.
https://hicexplorer.readthedocs.org
GNU General Public License v3.0
233 stars 70 forks source link

Integrate methods from HiFive? #246

Closed jxtx closed 6 years ago

jxtx commented 6 years ago

Hi All,

We (mostly @msauria) maintain another suite of 5C and HiC analysis tools implemented in Python (https://github.com/bxlab/hifive). We're interesting in exploring whether some of the novel methods we've implemented there could be moved into HiCExplorer so that we don't have to maintain a lot of redundant code. The key pieces include

To get started, is there documentation on the Python representation of HiC data and the file formats? (I see lots of fantastic documentation for the commands but I'm not finding documentation of the internals, probably just not looking in the right place!)

Thanks!

bgruening commented 6 years ago

Thanks @jxtx and @msauria for reaching out. This sounds like a great opportunity! We do have here a public holiday, but will get back to you next week.

Thanks a lot!

joachimwolff commented 6 years ago

Hi James,

I like your idea to integrate novel featuresof HiFive to HiCExplorer. Our internal documentation is not that great to be honest. What you need to know to do the first steps:

  1. All functions are written in independent scripts.
  2. They all use the class 'hiCMatrix' in the file 'HiCMatrix.py'. This is the data object which holds the Hi-C matrix, to open/save e.g. a cooler file you have to do:
from hicexplorer import HiCMatrix as hm
hic_ma = hm.hiCMatrix('matrix.cool')
hic_ma.save('newMatrix.cool')

the Hi-C binned contact matrix is a scipy csr_matrix and is accessible via:

hic_ma.matrix

The mapping to chromosomes and positions to the bins is stored in:

hic_ma.cut_intervals

It is a list of tuples [(chr1, 0, 20, 1.0), (chr1, 20, 40, 1.0)]. To get the matrix bin index of a chromosome position call

hic_ma.getRegionBinRange(chrname, start, end)

To get the chromosome position of a bin index:

hic_ma.getBinPos(index)
  1. Functions which are used by more than one script are collected in utilities.py.

  2. To contribute please check linting is passing, we test to:

flake8 . --exclude=.venv,.build,planemo_test_env,build --ignore=E501,F403,E402,F999,F405,E712

Moreover, your test cases need to pass locally and on travis.

py.test hicexplorer --doctest-modules

The test files are located in hicexplorer/test and we use the naming scheme test_hicNAME.py, test data is stored in hicexplorer/test/test_data/. Please have a look if you can reuse provided data.

  1. To write output to the bash we use logging and no print. To change the level of logging go to hicexplorer/__init__.py Please have in mind to change it back to INFO in case of a PR to develop branch.

  2. All scripts are located in the folder hicexplorer and start with hicNameOfFile.py. However, to run it you need to add an additional file to bin/hicNameOfFile (without the .py) and import the script and write the main function. The last step is then to add the bin file to setup.py, at the end of file line 115ff 'scripts'.

  3. I think a good start is to have a look at hicexplorer/hicInfo.py. This file is short but contains a good overview how things are working.

  4. We support Python 2 and 3, and sometimes we still have bugs in Python 3 caused by the conversion we did last year. However, we want to go to Python 3 only in the future, therefore please use Python 3 and prefere __future__ imports over __past__ to make it Python 2 compatible.

Please ask as much questions as necessary, we are happy for your contribution and will support you as good as we can.

Best,

Joachim

bgruening commented 6 years ago

@msauria please have a look at the develop branch. @joachimwolff is restructuring a few things, we have some special captureHiC changes and a general converting module.

joachimwolff commented 6 years ago

I don't have the impression there is any progress or interest to achieve the integration of HiFive to HiCExplorer. Closing this issue now.