deeptools / HiCExplorer

HiCExplorer is a powerful and easy to use set of tools to process, normalize and visualize Hi-C data.
https://hicexplorer.readthedocs.org
GNU General Public License v3.0
233 stars 70 forks source link

hicBuildMatrix --region: invalid genomicRegion value #242

Closed Rmulet closed 6 years ago

Rmulet commented 6 years ago

Hi,

I am analyzing capture Hi-C data, so I wanted to build a matrix only for the region that's enriched in the experiment. In principle, hicBuildMatrix offers this option with the parameter --region, but when I specify the coordinates of interest I get the following error message:

hicBuildMatrix: error: argument --region/-r: invalid genomicRegion value: 'chr21'

I have tried different options, namely adding the start/end positions and removing the 'chr' string, but none of them have worked. I suspect it has something to do with the Python version, as I have installed HiCExplorer in various ways (i.e. via pip, conda and manual install) and this only happens with those that run python 3. Yes, it's a bit messy, but I use python2 by default with pip, but python 3 with conda...

Best regards,

Roger

PS: It's my first time doing HiC data analysis, so perhaps this is not the best way to approach this. In any case, I don't think this is the expected behaviour of the program.

gtrichard commented 6 years ago

Hello Rmulet and thank you for your feedback and using HiCExplorer.

The easiest and safest way to use HiCExplorer is to install an anaconda2 or miniconda2 distribution and then using conda to install HiCExplorer and its dependencies.

This can be done easily on any machine, including remote machines such as clusters where you are a non-root user.

Could you try to do that please? If you need help to do so, please ask it here.

I know that there are some known issues with python3 and we thus always recommend to use a python2 distribution.

We will further look at your error and implement test cases to check this option in python3.

joachimwolff commented 6 years ago

Hi Roger,

you can create a conda environment and let it use Python 2:

conda create --name hicexplorer_python27 python=2.7 hicexplorer

and then you can activate it with: source activate hicexplorer_python27 and deactivate it with deactivate hicexplorer_python27. This will not influence your installed Python 3 software.

Rmulet commented 6 years ago

Thanks a lot for your advice, I will do that. I was already planning to find a solution involving python2, but in any case I wanted to report this error.

DonStephano commented 6 years ago

@Rmulet: Just a suggestion: you can create a genomewide Matrix with your Capture Hi-C data and then check out your enriched region of interest with hicPlotViewpoint. Building a genomwide matrix will consume some time (depending on the size/seqdepth/resources) but afterwards you are more flexible...

Rmulet commented 6 years ago

@DonStephano,

Thanks for the suggestion. Actually, I started by creating a genome-wide matrix with hicBuildMatrix, and that went fine, but I had issues with the correction because, well, most bins have very few reads. I suppose I could go around that by specifying very extreme z-value thresholds?

Aside from the increased flexibility, would there be any advantage to a genome wide matrix?

DonStephano commented 6 years ago

Ah, I would'nt use the correction of the Capture Hi-C matrix using hicCorrectMatrix. As far as I understood, hicCorrectMatrix assumes that each bin has the same amount of reads and because you are enriching a couple of regions in your Capture Hi-C, this would heavily impact your data. I would use the uncorrected matrix.

Concerning the above mentioned error: with version 1.8 it is possible to build a matrix for a specific region (@joachimwolff)

Rmulet commented 6 years ago

@DonStephano, it is indeed possible, but not with Python 3 as far as I can tell -- that's why I reported this as a bug. Following the suggestions of joachimwolff and gtrichard, I have installed HiCExplorer using Python2 via conda and everything works just fine.

If I have a matrix only for the region of my capture, then it should be acceptable to correct it with hicCorrectMatrix, right? If understand it correctly, I am removing bins with extremely low or extremely high counts in that region, which should be homogeneous in terms of coverage.

DonStephano commented 6 years ago

Ah I see. I didn't get that it's a python related issue :)

@Rmulet: So, you have enriched a complete continuous region of interest, i.e. a certain TAD?! If so, the normalization using hicCorrectMatrix should be ok (from my point of view). I was thinking about i.e. Promotor Capture Hi-C with a lot og 'gaps' in the matrix...

Rmulet commented 6 years ago

@DonStephano: Yes, we have enriched a region comprising a few TADs, so bins in the resulting matrix should contiguous with no gaps in between. However, it is good to keep your advice in mind in case we do promoter capture in the future.

And yes, it was a Python related issue, but since it's also my first time dealing with capture HiC, I have also appreciated your suggestions in this respect. I hope, though, that the developers can forgive me for diverting the topic!

joachimwolff commented 6 years ago

It is fixed with PR #244. Although it is in master, I wait a bit for a new bug fix release. Maybe we find a few more in the next days (my HiWi is working on that) and I do not want to do a release for every one liner patch.

LeilyR commented 6 years ago

I have the same problem as @Rmulet with the versions newer than 1.8 regardless of python that I use. Is there any changes that now region should only be CHR:START-END and cannot be just CHR?

LeilyR commented 6 years ago

tried 2.1.4 it is perfectly fine. Thanks!