deeptools / HiCExplorer

HiCExplorer is a powerful and easy to use set of tools to process, normalize and visualize Hi-C data.
https://hicexplorer.readthedocs.org
GNU General Public License v3.0
231 stars 70 forks source link

hicFindRestSite ImportError: Bio.Alphabet has been removed from Biopython. #675

Closed LuminescentBeing closed 3 years ago

LuminescentBeing commented 3 years ago

I am trying to use the hicFindRestSite command. I use the following code:

hicFindRestSite --f hg19.fa --p GANTC -o rest_site_positions.bed

It won't run and it gives the following error:

Traceback (most recent call last): File "/Users/luzruiz/opt/anaconda3/envs/hic/bin/hicFindRestSite", line 4, in from hicexplorer.hicFindRestSite import main File "/Users/luzruiz/opt/anaconda3/envs/hic/lib/python3.8/site-packages/hicexplorer/hicFindRestSite.py", line 11, in from Bio.Alphabet import generic_dna File "/Users/luzruiz/opt/anaconda3/envs/hic/lib/python3.8/site-packages/Bio/Alphabet/init.py", line 20, in raise ImportError( ImportError: Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the molecule_type as an annotation on a SeqRecord for your script to work correctly. Please see https://biopython.org/wiki/Alphabet for more information.

zqzneptune commented 3 years ago

I would assume this for Arima kit, which has been mentioned previously: https://github.com/deeptools/HiCExplorer/issues/659#issue-784897851

My current work around solution is to use HiC-pro http://nservant.github.io/HiC-Pro/UTILS.html#digest-genome-py

-Johnson

LuminescentBeing commented 3 years ago

Hi Johnson @zqzneptune,

Yes, I do use the Arima kit! I really appreciate the response. I'll check out HiC-Pro right now. After you do that, can you continue using HiC-Explorer without any issues?

-Luz

joachimwolff commented 3 years ago

Hi,

how have you installed HiCExplorer and its dependencies? We have fixed the version number for the dependency biopython to be < 1.77, because version 1.77 had this API change and is causing this crash.

Also conda has this fixation: https://github.com/bioconda/bioconda-recipes/blob/master/recipes/hicexplorer/meta.yaml#L28

Best,

Joachim

LuminescentBeing commented 3 years ago

Hi @joachimwolff,

I installed HiCExplorer using the following:

conda install hicexplorer -c bioconda -c conda-forge

I also get the same error message when I try to use hicBuildMatrix. What can I do to get the right dependencies?

joachimwolff commented 3 years ago

What version of HiCExplorer gets installed? What version of biopython? conda list | grep biopython conda list | grep hicexplorer

As always, please use conda environments to separate different software from each other and to avoid any influence of different version numbers: conda create --name hicexplorer hicexplorer=3.6 python=3.8

Best,

Joachim

LuminescentBeing commented 3 years ago

Hi @joachimwolff,

The issue has been resolved. It turns out I had an outdated version of HiCExplorer. I installed it using conda install hicexplorer=3.6 and now I am able to run the code.

Thank you so much for the help!

Best wishes,

Luz

zqzneptune commented 3 years ago

Hi, @joachimwolff , Just a follow up question, in terms of find res sites for Arima kit, which search pattern would fit into the hicFindRestSite? I tried, -searchPattern ^GATC,G^ANTC, but did not get anything. Thanks! -Johnson

LuminescentBeing commented 3 years ago

Oh, I also have a question in regards to that.

I put the following and was able to run it completely: hicFindRestSite --fasta hg19/hg19.fa --searchPattern GA.TC -o rest_site_positions.bed

What would be the dangling sequence for both ^GATC and G^ANTC ?

joachimwolff commented 3 years ago

@zqzneptune You can search regex pattern as described here: https://docs.python.org/3/library/re.html Please note it might be the case that Pythons symbols and their meaning differ from other software. ^ stands for a start of a string and then your sequence. I am not sure if this what you want to get; i.e., this would only match if the sequence you search for is at the a) ^GATC start of a line but nothing in between, and b) G^ANTC previous line ends with G and the new line starts with ANTC.

@LuminescentBeing The dangling sequence depends on the restriction enzyme; a good database is in my experience: https://enzymefinder.neb.com/#!/sequence/GATC?category=all&type=any&exact=yes&exactlen=yes#nebheader