deeptools / HiCExplorer

HiCExplorer is a powerful and easy to use set of tools to process, normalize and visualize Hi-C data.
https://hicexplorer.readthedocs.org
GNU General Public License v3.0
233 stars 70 forks source link

deprecate liberman format as import/export #257

Closed vivekbhr closed 6 years ago

vivekbhr commented 6 years ago

Extending on the conversation about import/export of matrices, I think it's valid to support multiple input formats and keep only one output format. We should also deprecate liberman format support since they moved to .hic, which confuses people.

By the way just mentioning here. We are however not deprecating GInteractions format support!

joachimwolff commented 6 years ago

I am thinking for the future of the following:

The reason for this is, that it is difficult to support all these different file formats especially if no test cases are provided for them (Like for the old formats). Additionally it makes it complicated to write new functions and or extend old ones if different file formats support different features and we want to use the benefits of all of them, it increases the chances for errors in the code.

I would like to know: Which of the old formats should still be supported? Only format I would like to support is h5, all the other can be dropped in my opinion. If you, @vivekbhr , wants to have e.g. GInteractions please explain very short why we still need it and if you want to volunteer to be the maintainer for this format i.e. that you write test cases for it and for each release you make sure everything is still working.

vivekbhr commented 6 years ago

I would keep .h5, .cool/mcool and GInteraction format as exports. If we support .hic as input, then we should also keep .hic as an output format, since both .cool and .hic are being used in 4DN. That's why I also think we should keep the name HiCExport.

GInteraction is the format that bioconductor folks use. Yes, I can take care of maintaining it.

fidelram commented 6 years ago

For exports is important to keep a format that outputs dense matrices that can be loaded into R.

On Tue, May 29, 2018 at 1:27 PM Vivek Bhardwaj notifications@github.com wrote:

I would keep .h5, .cool/mcool and GInteraction format as exports. If we support .hic as input, then we should also keep .hic as an output format, since both .cool and .hic are being used in 4DN. That's why I also think we should keep the name HiCExport.

GInteraction is the format that bioconductor folks use. Yes, I can take care of maintaining it.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/deeptools/HiCExplorer/issues/257#issuecomment-392743345, or mute the thread https://github.com/notifications/unsubscribe-auth/AEu_1XPfP5u9qiWhhE0L3WKSy3fAqyf1ks5t3TCrgaJpZM4URC25 .

bgruening commented 6 years ago

We should make this decision entirely dependent on the amount of code and code-path this adds/keeps in HiCExplorer. I have no problem to drop all these formats from the main code and instead create an external tool cool-to-whatever, whatever-to-cool. Imho this is all about maintainability and we should support one format very well.

joachimwolff commented 6 years ago

The only reason I have to change to name to something else than hicExport is to make it clear that this script is having new functions and that many old functions are no longer supported. hicExport would still exist but it is deprecated and it is clear for everyone that we do not support it anymore.

joachimwolff commented 6 years ago

I reorganized the source code a bit to have it more modular and extendable, see branch file_formats. The most important features are:

Not everything is implemented so far. Missing:

Moreover I removed some dead code in HiCMatrix. Please check that I only removed dead code and not still active one. Please give me some feedback about missing / wished features, and of course bugs :)

joachimwolff commented 6 years ago

Short update:

bgruening commented 6 years ago

Very nice!

crazyhottommy commented 6 years ago

Hi, where can I find hicConvertFileFormats? I install by conda hicexplorer 2.1.3 thanks!

joachimwolff commented 6 years ago

Hi,

This is a feature which will be released with the next 2.2 release. It is not part of HiCExplorer 2.1.x.

If you like and you know how, you can test it in the develop branch of this github repository. As the name is indicating it is still under development and functionality can change or is more likely to crash. We would be happy about any feedback from users if something goes wrong or features are missing.

Best,

Joachim

deebhar commented 6 years ago

HI I want to convert hic format to h5 or any other which I can use in hicexplorer. Is this tool ready? I have also converted hic to cool but I am still not able to convert it to h5 using hiExport

crazyhottommy commented 6 years ago

@deebhar you have to do like this

convert hic to cool and to h5 for pyGenomeTracks

see https://github.com/deeptools/HiCExplorer/issues/239 and https://github.com/deeptools/HiCExplorer/issues/257

use this tool https://github.com/4dn-dcic/hic2cool

pip install hic2cool
hic2cool Control_H3K27ac_allValidPairs.hic H3K27ac_HiC.cool -r 5000

hicExport --inFile H3K27ac_HiC.cool --outFileName H3K27ac_HiC --inputFormat cool --outputFormat h5
deebhar commented 6 years ago

HI

Thanks for the information. I was already trying this. I am getting this error after running hicExport

Traceback (most recent call last):

File "/usr/local/bin/hicExport", line 7, in

main()

File "/usr/local/lib/python2.7/dist-packages/hicexplorer/hicExport.py", line 190, in main

hic_ma = hm.hiCMatrix(matrixFile=args.inFile[0],

file_format=args.inputFormat)

File "/usr/local/lib/python2.7/dist-packages/hicexplorer/HiCMatrix.py", line 128, in init

self.load_cool(matrixFile, pChrnameList=chrnameList,

pIntraChromosomalOnly=pIntraChromosomalOnly)

File "/usr/local/lib/python2.7/dist-packages/hicexplorer/HiCMatrix.py", line 182, in load_cool

cut_intervals_data_frame = self.cooler_file.bins()[['chrom', 'start',

'end', 'weight']][:]

File "/usr/local/lib/python2.7/dist-packages/cooler/core.py", line 528, in getitem

return self._slice(self.fields, lo, hi)

File "/usr/local/lib/python2.7/dist-packages/cooler/api.py", line 219, in _slice

return bins(grp, lo, hi, fields, **kwargs)

File "/usr/local/lib/python2.7/dist-packages/cooler/api.py", line 393, in bins

out = get(h5['bins'], lo, hi, fields, **kwargs)

File "/usr/local/lib/python2.7/dist-packages/cooler/core.py", line 54, in get

dset = grp[field]

File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper

File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper

File "/usr/local/lib/python2.7/dist-packages/h5py/_hl/group.py", line 177, in getitem

oid = h5o.open(self.id, self._e(name), lapl=self._lapl)

File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper

File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper

File "h5py/h5o.pyx", line 190, in h5py.h5o.open

KeyError: "Unable to open object (object 'weight' doesn't exist)"

Please let me know what should I change to avoid this error.

Thanks

Best Wishes

Deeksha

On Wed, Oct 3, 2018 at 7:59 PM Ming Tang notifications@github.com wrote:

@deebhar https://github.com/deebhar you have to do like this convert hic to cool and to h5 for pyGenomeTracks

see #239 https://github.com/deeptools/HiCExplorer/issues/239 and

257 https://github.com/deeptools/HiCExplorer/issues/257

use this tool https://github.com/4dn-dcic/hic2cool

pip install hic2cool hic2cool Control_H3K27ac_allValidPairs.hic H3K27ac_HiC.cool -r 5000

hicExport --inFile H3K27ac_HiC.cool --outFileName H3K27ac_HiC --inputFormat cool --outputFormat h5

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/deeptools/HiCExplorer/issues/257#issuecomment-426658592, or mute the thread https://github.com/notifications/unsubscribe-auth/AinaL8Vc7n_tMZW9EGnjj8WaNMZS_4GUks5uhMm3gaJpZM4URC25 .

-- Thanks and Regards

Dr. Deeksha Bhartiya

joachimwolff commented 6 years ago

Please check if you are using HiCExplorer version 2.1.4. This looks for me like a bug from an earlier version.

deebhar commented 6 years ago

HI

Thanks for your response. You are right. The version I am using is 2.0. Is there a way to upgrade it. I tried reinstalling it and the same version is installing. Then I downloaded HicExplorer using git which is the new version. But here HicExport is now giving errors

NFO:root:Generating grammar tables from /usr/lib/python2.7/lib2to3/Grammar.txt

INFO:root:Generating grammar tables from /usr/lib/python2.7/lib2to3/PatternGrammar.txt

Traceback (most recent call last):

File "../HiCExplorer/bin/./hicExport", line 4, in

from hicexplorer.hicExport import main

File "/usr/local/lib/python2.7/dist-packages/hicexplorer/hicExport.py", line 3, in

from hicexplorer import HiCMatrix as hm

File "/usr/local/lib/python2.7/dist-packages/hicexplorer/HiCMatrix.py", line 16, in

from .utilities import toBytes

File "/usr/local/lib/python2.7/dist-packages/hicexplorer/utilities.py", line 7, in

from unidecode import unidecode

ImportError: No module named unidecode

What is the mistake I am making here. I am calli ng the program from bin folder.

Thanks

Best Wishes

Deeksha

On Thu, Oct 4, 2018 at 2:10 PM Joachim Wolff notifications@github.com wrote:

Please check if you are using HiCExplorer version 2.1.4. This looks for me like a bug from an earlier version.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/deeptools/HiCExplorer/issues/257#issuecomment-426934256, or mute the thread https://github.com/notifications/unsubscribe-auth/AinaL92GI6ZCaw9hyhVv4c2lhyE6XqPzks5uhcmHgaJpZM4URC25 .

-- Thanks and Regards

Dr. Deeksha Bhartiya

joachimwolff commented 6 years ago

Please use the package manager conda and the bioconda channel, use environments to exclude possible errors. Download conda from here: https://conda.io/miniconda.html Install it, and add bioconda and conda-forge channel: http://bioconda.github.io/ Create a HiCExplorer environment: conda create --name hicexplorer hicexplorer python=3.6 and switch to the environment: source activate hicexplorer

One minor issue you will maybe have: hic files don't care about conventions, they store their correction factors in different named fields. HiCExplorer does not support this with the stable version but it is in the development branch. If you really need these correction factors, please write again. A better solution is to take the raw data and apply the correction with the HiCExplorer module hicCorrectMatrix.

//edit

Maybe a better solution for you is the Galaxy HiCExplorer: https://hicexplorer.usegalaxy.eu Documentation: hicexplorer.readthedocs.org, and http://galaxyproject.github.io/training-material/topics/epigenetics/tutorials/hicexplorer/tutorial.html and https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6031062/

deebhar commented 6 years ago

Hi

Thanks a lot its working now. Thank you so much.

Best Deeksha

On Thu, Oct 4, 2018 at 4:42 PM Joachim Wolff notifications@github.com wrote:

Please use the package manager conda and the bioconda channel, use environments to exclude possible errors. Download conda from here: https://conda.io/miniconda.html Install it, and add bioconda and conda-forge channel: http://bioconda.github.io/ Create a HiCExplorer environment: conda create --name hicexplorer hicexplorer python=3.6 and switch to the environment: source activate hicexplorer

One minor issue you will maybe have: hic files don't care about conventions, they store their correction factors in different named fields. HiCExplorer does not support this with the stable version but it is in the development branch. If you really need these correction factors, please write again. A better solution is to take the raw data and apply the correction with the HiCExplorer module hicCorrectMatrix.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/deeptools/HiCExplorer/issues/257#issuecomment-426980181, or mute the thread https://github.com/notifications/unsubscribe-auth/AinaL9-qBgrDrZai-a_C_pESYgvEgBqNks5uhe0pgaJpZM4URC25 .

-- Thanks and Regards

Dr. Deeksha Bhartiya

deebhar commented 6 years ago

HI

I have question. What does the values in output file of hicPlotTADs boundaries.bed refer to?

Are they the exact boundaries of the TADs for a given resolution. Can I use those values as start and stop positions of a TAD?

On Fri, Oct 5, 2018 at 1:26 PM deeksha bhartiya deeksha.bhartiya@gmail.com wrote:

Hi

Thanks a lot its working now. Thank you so much.

Best Deeksha

On Thu, Oct 4, 2018 at 4:42 PM Joachim Wolff notifications@github.com wrote:

Please use the package manager conda and the bioconda channel, use environments to exclude possible errors. Download conda from here: https://conda.io/miniconda.html Install it, and add bioconda and conda-forge channel: http://bioconda.github.io/ Create a HiCExplorer environment: conda create --name hicexplorer hicexplorer python=3.6 and switch to the environment: source activate hicexplorer

One minor issue you will maybe have: hic files don't care about conventions, they store their correction factors in different named fields. HiCExplorer does not support this with the stable version but it is in the development branch. If you really need these correction factors, please write again. A better solution is to take the raw data and apply the correction with the HiCExplorer module hicCorrectMatrix.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/deeptools/HiCExplorer/issues/257#issuecomment-426980181, or mute the thread https://github.com/notifications/unsubscribe-auth/AinaL9-qBgrDrZai-a_C_pESYgvEgBqNks5uhe0pgaJpZM4URC25 .

-- Thanks and Regards

Dr. Deeksha Bhartiya

-- Thanks and Regards

Dr. Deeksha Bhartiya

deebhar commented 6 years ago

HI

The reason I am asking this because in the screenshot enclosed you can see an image on the left and the corresponding locations from a tad_score.bm file. So as written in your paper, The boundaries, estimated using the TAD separation score are shown as vertical lines. If these are the boundaries, what are these values in the bm file? I have to find out the start and stop locations of TAD. How can get them?

Sorry for bothering you again and again.

Thanks for your help.

Deeksha

On Fri, Oct 5, 2018 at 3:57 PM deeksha bhartiya deeksha.bhartiya@gmail.com wrote:

HI

I have question. What does the values in output file of hicPlotTADs boundaries.bed refer to?

Are they the exact boundaries of the TADs for a given resolution. Can I use those values as start and stop positions of a TAD?

On Fri, Oct 5, 2018 at 1:26 PM deeksha bhartiya < deeksha.bhartiya@gmail.com> wrote:

Hi

Thanks a lot its working now. Thank you so much.

Best Deeksha

On Thu, Oct 4, 2018 at 4:42 PM Joachim Wolff notifications@github.com wrote:

Please use the package manager conda and the bioconda channel, use environments to exclude possible errors. Download conda from here: https://conda.io/miniconda.html Install it, and add bioconda and conda-forge channel: http://bioconda.github.io/ Create a HiCExplorer environment: conda create --name hicexplorer hicexplorer python=3.6 and switch to the environment: source activate hicexplorer

One minor issue you will maybe have: hic files don't care about conventions, they store their correction factors in different named fields. HiCExplorer does not support this with the stable version but it is in the development branch. If you really need these correction factors, please write again. A better solution is to take the raw data and apply the correction with the HiCExplorer module hicCorrectMatrix.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/deeptools/HiCExplorer/issues/257#issuecomment-426980181, or mute the thread https://github.com/notifications/unsubscribe-auth/AinaL9-qBgrDrZai-a_C_pESYgvEgBqNks5uhe0pgaJpZM4URC25 .

-- Thanks and Regards

Dr. Deeksha Bhartiya

-- Thanks and Regards

Dr. Deeksha Bhartiya

-- Thanks and Regards

Dr. Deeksha Bhartiya

joachimwolff commented 6 years ago

Hi,

first, there is no screenshot. I think you need to attach it on github and you cannot do this via mail.

The boundaries.bed file gives the location of the TAD boundaries in an ongoing way:

chr1 1 2 0.1
chr1 4 5 0.3
chr1 7 8 0.4

means we have a boundary at pos 1, 4 and 7 forming the TADs: 1 - 5, 5 - 8. The values in the domains.bm file are start and stop position of each TAD, if you compare boundaries and domains file you will see that two consecutive lines from the boundaries file form one domain. I think for your needs the domains.dm file is more helpful. For more information please see: https://hicexplorer.readthedocs.io/en/latest/content/tools/hicFindTADs.html#hicfindtads

Best,

Joachim

deebhar commented 6 years ago

Hi

Thanks for your information.

Best wishes

Deeksha

On Fri, Oct 5, 2018 at 6:36 PM Joachim Wolff notifications@github.com wrote:

Hi,

first, there is no screenshot. I think you need to attach it on github and you cannot do this via mail.

The boundaries.bed file gives the location of the TAD boundaries in an ongoing way:

chr1 1 2 0.1 chr1 4 5 0.3 chr1 7 8 0.4

means we have a boundary at pos 1, 4 and 7 forming the TADs: 1 - 5, 5 - 8. The values in the domains.bm file are start and stop position of each TAD, if you compare boundaries and domains file you will see that two consecutive lines from the boundaries file form one domain. I think for your needs the domains.dm file is more helpful. For more information please see: https://hicexplorer.readthedocs.io/en/latest/content/tools/hicFindTADs.html#hicfindtads

Best,

Joachim

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/deeptools/HiCExplorer/issues/257#issuecomment-427359607, or mute the thread https://github.com/notifications/unsubscribe-auth/AinaL1Hgm-H6smmTufb6COwHdDZOZobjks5uh1lkgaJpZM4URC25 .

-- Thanks and Regards

Dr. Deeksha Bhartiya