Closed vivekbhr closed 6 years ago
I am thinking for the future of the following:
The reason for this is, that it is difficult to support all these different file formats especially if no test cases are provided for them (Like for the old formats). Additionally it makes it complicated to write new functions and or extend old ones if different file formats support different features and we want to use the benefits of all of them, it increases the chances for errors in the code.
I would like to know: Which of the old formats should still be supported? Only format I would like to support is h5, all the other can be dropped in my opinion. If you, @vivekbhr , wants to have e.g. GInteractions please explain very short why we still need it and if you want to volunteer to be the maintainer for this format i.e. that you write test cases for it and for each release you make sure everything is still working.
I would keep .h5, .cool/mcool and GInteraction format as exports. If we support .hic as input, then we should also keep .hic as an output format, since both .cool and .hic are being used in 4DN. That's why I also think we should keep the name HiCExport.
GInteraction is the format that bioconductor folks use. Yes, I can take care of maintaining it.
For exports is important to keep a format that outputs dense matrices that can be loaded into R.
On Tue, May 29, 2018 at 1:27 PM Vivek Bhardwaj notifications@github.com wrote:
I would keep .h5, .cool/mcool and GInteraction format as exports. If we support .hic as input, then we should also keep .hic as an output format, since both .cool and .hic are being used in 4DN. That's why I also think we should keep the name HiCExport.
GInteraction is the format that bioconductor folks use. Yes, I can take care of maintaining it.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/deeptools/HiCExplorer/issues/257#issuecomment-392743345, or mute the thread https://github.com/notifications/unsubscribe-auth/AEu_1XPfP5u9qiWhhE0L3WKSy3fAqyf1ks5t3TCrgaJpZM4URC25 .
We should make this decision entirely dependent on the amount of code and code-path this adds/keeps in HiCExplorer. I have no problem to drop all these formats from the main code and instead create an external tool cool-to-whatever, whatever-to-cool. Imho this is all about maintainability and we should support one format very well.
The only reason I have to change to name to something else than hicExport is to make it clear that this script is having new functions and that many old functions are no longer supported. hicExport would still exist but it is deprecated and it is clear for everyone that we do not support it anymore.
I reorganized the source code a bit to have it more modular and extendable, see branch file_formats
. The most important features are:
MatrixFileHandler
object and this is responsible to handle loading and saving of the different matrix formats.save
function the file type to cool in the moment. But this is open to discussion, we can decide later how to handle this.matrixFileHandler
to interact with HiCMatrix, a base class MatrixFile
and classes which inherit from it: cool
, mcool
, homer
, h5
, ginteractions
, hicpro
. hicConvertFileFormats
, it can convert in the moment:
hic
file format we have to implement it on our own. (Or maybe someone finds a lib for that, haven't had success so far)Not everything is implemented so far. Missing:
Moreover I removed some dead code in HiCMatrix. Please check that I only removed dead code and not still active one. Please give me some feedback about missing / wished features, and of course bugs :)
Short update:
Very nice!
Hi, where can I find hicConvertFileFormats
? I install by conda hicexplorer 2.1.3
thanks!
Hi,
This is a feature which will be released with the next 2.2 release. It is not part of HiCExplorer 2.1.x.
If you like and you know how, you can test it in the develop branch of this github repository. As the name is indicating it is still under development and functionality can change or is more likely to crash. We would be happy about any feedback from users if something goes wrong or features are missing.
Best,
Joachim
HI I want to convert hic format to h5 or any other which I can use in hicexplorer. Is this tool ready? I have also converted hic to cool but I am still not able to convert it to h5 using hiExport
@deebhar you have to do like this
see https://github.com/deeptools/HiCExplorer/issues/239 and https://github.com/deeptools/HiCExplorer/issues/257
use this tool https://github.com/4dn-dcic/hic2cool
pip install hic2cool
hic2cool Control_H3K27ac_allValidPairs.hic H3K27ac_HiC.cool -r 5000
hicExport --inFile H3K27ac_HiC.cool --outFileName H3K27ac_HiC --inputFormat cool --outputFormat h5
HI
Thanks for the information. I was already trying this. I am getting this error after running hicExport
Traceback (most recent call last):
File "/usr/local/bin/hicExport", line 7, in
main()
File "/usr/local/lib/python2.7/dist-packages/hicexplorer/hicExport.py", line 190, in main
hic_ma = hm.hiCMatrix(matrixFile=args.inFile[0],
file_format=args.inputFormat)
File "/usr/local/lib/python2.7/dist-packages/hicexplorer/HiCMatrix.py", line 128, in init
self.load_cool(matrixFile, pChrnameList=chrnameList,
pIntraChromosomalOnly=pIntraChromosomalOnly)
File "/usr/local/lib/python2.7/dist-packages/hicexplorer/HiCMatrix.py", line 182, in load_cool
cut_intervals_data_frame = self.cooler_file.bins()[['chrom', 'start',
'end', 'weight']][:]
File "/usr/local/lib/python2.7/dist-packages/cooler/core.py", line 528, in getitem
return self._slice(self.fields, lo, hi)
File "/usr/local/lib/python2.7/dist-packages/cooler/api.py", line 219, in _slice
return bins(grp, lo, hi, fields, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/cooler/api.py", line 393, in bins
out = get(h5['bins'], lo, hi, fields, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/cooler/core.py", line 54, in get
dset = grp[field]
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "/usr/local/lib/python2.7/dist-packages/h5py/_hl/group.py", line 177, in getitem
oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5o.pyx", line 190, in h5py.h5o.open
KeyError: "Unable to open object (object 'weight' doesn't exist)"
Please let me know what should I change to avoid this error.
Thanks
Best Wishes
Deeksha
On Wed, Oct 3, 2018 at 7:59 PM Ming Tang notifications@github.com wrote:
@deebhar https://github.com/deebhar you have to do like this convert hic to cool and to h5 for pyGenomeTracks
see #239 https://github.com/deeptools/HiCExplorer/issues/239 and
257 https://github.com/deeptools/HiCExplorer/issues/257
use this tool https://github.com/4dn-dcic/hic2cool
pip install hic2cool hic2cool Control_H3K27ac_allValidPairs.hic H3K27ac_HiC.cool -r 5000
hicExport --inFile H3K27ac_HiC.cool --outFileName H3K27ac_HiC --inputFormat cool --outputFormat h5
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/deeptools/HiCExplorer/issues/257#issuecomment-426658592, or mute the thread https://github.com/notifications/unsubscribe-auth/AinaL8Vc7n_tMZW9EGnjj8WaNMZS_4GUks5uhMm3gaJpZM4URC25 .
-- Thanks and Regards
Dr. Deeksha Bhartiya
Please check if you are using HiCExplorer version 2.1.4. This looks for me like a bug from an earlier version.
HI
Thanks for your response. You are right. The version I am using is 2.0. Is there a way to upgrade it. I tried reinstalling it and the same version is installing. Then I downloaded HicExplorer using git which is the new version. But here HicExport is now giving errors
NFO:root:Generating grammar tables from /usr/lib/python2.7/lib2to3/Grammar.txt
INFO:root:Generating grammar tables from /usr/lib/python2.7/lib2to3/PatternGrammar.txt
Traceback (most recent call last):
File "../HiCExplorer/bin/./hicExport", line 4, in
from hicexplorer.hicExport import main
File "/usr/local/lib/python2.7/dist-packages/hicexplorer/hicExport.py",
line 3, in
from hicexplorer import HiCMatrix as hm
File "/usr/local/lib/python2.7/dist-packages/hicexplorer/HiCMatrix.py",
line 16, in
from .utilities import toBytes
File "/usr/local/lib/python2.7/dist-packages/hicexplorer/utilities.py",
line 7, in
from unidecode import unidecode
ImportError: No module named unidecode
What is the mistake I am making here. I am calli ng the program from bin folder.
Thanks
Best Wishes
Deeksha
On Thu, Oct 4, 2018 at 2:10 PM Joachim Wolff notifications@github.com wrote:
Please check if you are using HiCExplorer version 2.1.4. This looks for me like a bug from an earlier version.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/deeptools/HiCExplorer/issues/257#issuecomment-426934256, or mute the thread https://github.com/notifications/unsubscribe-auth/AinaL92GI6ZCaw9hyhVv4c2lhyE6XqPzks5uhcmHgaJpZM4URC25 .
-- Thanks and Regards
Dr. Deeksha Bhartiya
Please use the package manager conda and the bioconda channel, use environments to exclude possible errors.
Download conda from here: https://conda.io/miniconda.html
Install it, and add bioconda and conda-forge channel: http://bioconda.github.io/
Create a HiCExplorer environment:
conda create --name hicexplorer hicexplorer python=3.6
and switch to the environment: source activate hicexplorer
One minor issue you will maybe have: hic files don't care about conventions, they store their correction factors in different named fields. HiCExplorer does not support this with the stable version but it is in the development branch. If you really need these correction factors, please write again. A better solution is to take the raw data and apply the correction with the HiCExplorer module hicCorrectMatrix.
//edit
Maybe a better solution for you is the Galaxy HiCExplorer: https://hicexplorer.usegalaxy.eu Documentation: hicexplorer.readthedocs.org, and http://galaxyproject.github.io/training-material/topics/epigenetics/tutorials/hicexplorer/tutorial.html and https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6031062/
Hi
Thanks a lot its working now. Thank you so much.
Best Deeksha
On Thu, Oct 4, 2018 at 4:42 PM Joachim Wolff notifications@github.com wrote:
Please use the package manager conda and the bioconda channel, use environments to exclude possible errors. Download conda from here: https://conda.io/miniconda.html Install it, and add bioconda and conda-forge channel: http://bioconda.github.io/ Create a HiCExplorer environment: conda create --name hicexplorer hicexplorer python=3.6 and switch to the environment: source activate hicexplorer
One minor issue you will maybe have: hic files don't care about conventions, they store their correction factors in different named fields. HiCExplorer does not support this with the stable version but it is in the development branch. If you really need these correction factors, please write again. A better solution is to take the raw data and apply the correction with the HiCExplorer module hicCorrectMatrix.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/deeptools/HiCExplorer/issues/257#issuecomment-426980181, or mute the thread https://github.com/notifications/unsubscribe-auth/AinaL9-qBgrDrZai-a_C_pESYgvEgBqNks5uhe0pgaJpZM4URC25 .
-- Thanks and Regards
Dr. Deeksha Bhartiya
HI
I have question. What does the values in output file of hicPlotTADs boundaries.bed refer to?
Are they the exact boundaries of the TADs for a given resolution. Can I use those values as start and stop positions of a TAD?
On Fri, Oct 5, 2018 at 1:26 PM deeksha bhartiya deeksha.bhartiya@gmail.com wrote:
Hi
Thanks a lot its working now. Thank you so much.
Best Deeksha
On Thu, Oct 4, 2018 at 4:42 PM Joachim Wolff notifications@github.com wrote:
Please use the package manager conda and the bioconda channel, use environments to exclude possible errors. Download conda from here: https://conda.io/miniconda.html Install it, and add bioconda and conda-forge channel: http://bioconda.github.io/ Create a HiCExplorer environment: conda create --name hicexplorer hicexplorer python=3.6 and switch to the environment: source activate hicexplorer
One minor issue you will maybe have: hic files don't care about conventions, they store their correction factors in different named fields. HiCExplorer does not support this with the stable version but it is in the development branch. If you really need these correction factors, please write again. A better solution is to take the raw data and apply the correction with the HiCExplorer module hicCorrectMatrix.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/deeptools/HiCExplorer/issues/257#issuecomment-426980181, or mute the thread https://github.com/notifications/unsubscribe-auth/AinaL9-qBgrDrZai-a_C_pESYgvEgBqNks5uhe0pgaJpZM4URC25 .
-- Thanks and Regards
Dr. Deeksha Bhartiya
-- Thanks and Regards
Dr. Deeksha Bhartiya
HI
The reason I am asking this because in the screenshot enclosed you can see an image on the left and the corresponding locations from a tad_score.bm file. So as written in your paper, The boundaries, estimated using the TAD separation score are shown as vertical lines. If these are the boundaries, what are these values in the bm file? I have to find out the start and stop locations of TAD. How can get them?
Sorry for bothering you again and again.
Thanks for your help.
Deeksha
On Fri, Oct 5, 2018 at 3:57 PM deeksha bhartiya deeksha.bhartiya@gmail.com wrote:
HI
I have question. What does the values in output file of hicPlotTADs boundaries.bed refer to?
Are they the exact boundaries of the TADs for a given resolution. Can I use those values as start and stop positions of a TAD?
On Fri, Oct 5, 2018 at 1:26 PM deeksha bhartiya < deeksha.bhartiya@gmail.com> wrote:
Hi
Thanks a lot its working now. Thank you so much.
Best Deeksha
On Thu, Oct 4, 2018 at 4:42 PM Joachim Wolff notifications@github.com wrote:
Please use the package manager conda and the bioconda channel, use environments to exclude possible errors. Download conda from here: https://conda.io/miniconda.html Install it, and add bioconda and conda-forge channel: http://bioconda.github.io/ Create a HiCExplorer environment: conda create --name hicexplorer hicexplorer python=3.6 and switch to the environment: source activate hicexplorer
One minor issue you will maybe have: hic files don't care about conventions, they store their correction factors in different named fields. HiCExplorer does not support this with the stable version but it is in the development branch. If you really need these correction factors, please write again. A better solution is to take the raw data and apply the correction with the HiCExplorer module hicCorrectMatrix.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/deeptools/HiCExplorer/issues/257#issuecomment-426980181, or mute the thread https://github.com/notifications/unsubscribe-auth/AinaL9-qBgrDrZai-a_C_pESYgvEgBqNks5uhe0pgaJpZM4URC25 .
-- Thanks and Regards
Dr. Deeksha Bhartiya
-- Thanks and Regards
Dr. Deeksha Bhartiya
-- Thanks and Regards
Dr. Deeksha Bhartiya
Hi,
first, there is no screenshot. I think you need to attach it on github and you cannot do this via mail.
The boundaries.bed file gives the location of the TAD boundaries in an ongoing way:
chr1 1 2 0.1
chr1 4 5 0.3
chr1 7 8 0.4
means we have a boundary at pos 1, 4 and 7 forming the TADs: 1 - 5, 5 - 8. The values in the domains.bm file are start and stop position of each TAD, if you compare boundaries and domains file you will see that two consecutive lines from the boundaries file form one domain. I think for your needs the domains.dm file is more helpful. For more information please see: https://hicexplorer.readthedocs.io/en/latest/content/tools/hicFindTADs.html#hicfindtads
Best,
Joachim
Hi
Thanks for your information.
Best wishes
Deeksha
On Fri, Oct 5, 2018 at 6:36 PM Joachim Wolff notifications@github.com wrote:
Hi,
first, there is no screenshot. I think you need to attach it on github and you cannot do this via mail.
The boundaries.bed file gives the location of the TAD boundaries in an ongoing way:
chr1 1 2 0.1 chr1 4 5 0.3 chr1 7 8 0.4
means we have a boundary at pos 1, 4 and 7 forming the TADs: 1 - 5, 5 - 8. The values in the domains.bm file are start and stop position of each TAD, if you compare boundaries and domains file you will see that two consecutive lines from the boundaries file form one domain. I think for your needs the domains.dm file is more helpful. For more information please see: https://hicexplorer.readthedocs.io/en/latest/content/tools/hicFindTADs.html#hicfindtads
Best,
Joachim
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/deeptools/HiCExplorer/issues/257#issuecomment-427359607, or mute the thread https://github.com/notifications/unsubscribe-auth/AinaL1Hgm-H6smmTufb6COwHdDZOZobjks5uh1lkgaJpZM4URC25 .
-- Thanks and Regards
Dr. Deeksha Bhartiya
Extending on the conversation about import/export of matrices, I think it's valid to support multiple input formats and keep only one output format. We should also deprecate liberman format support since they moved to .hic, which confuses people.
By the way just mentioning here. We are however not deprecating GInteractions format support!