WGLab / RepeatHMM

a hidden Markov model to infer simple repeats from genome sequences
Other
34 stars 14 forks source link

ModuleNotFoundError: No module named 'Bio' #14

Closed efsalcedo closed 6 years ago

efsalcedo commented 6 years ago

I installed the program according to the specifications, now I'm trying to run the repeatHMM.py script but I receive an error message regarding a module of the included scripts.

python repeatHMM.py /Users/fernando/CURSO_INCAN_Junio_3-4_2018/Sequences/IonCode.bam

Traceback (most recent call last): File "repeatHMM.py", line 17, in from scripts import myBAMhandler File "/Applications/RepeatHMM-master/bin/scripts/myBAMhandler.py", line 21, in from Bio import pairwise2 ModuleNotFoundError: No module named 'Bio

liuqianhn commented 6 years ago

Hi @efsalcedo , it seems that "biopython" cannot be found. Do you install "biopython", or is "biopython" installed folder is in the python-package searching path?

efsalcedo commented 6 years ago

Hi liuqianhn, Thanks for your quick response. I imagine that biopython was installed with the following command that I executed: pip install peakutils==1.0.3 hmmlearn sklearn biopython

efsalcedo commented 6 years ago

I'm trying to re-install:

pip install biopython Collecting biopython Cache entry deserialization failed, entry ignored Downloading https://files.pythonhosted.org/packages/6a/22/c5b6e425d7ed86a52fe10be670b95513b43e0853908d70a984d9a68a9945/biopython-1.72-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (2.2MB) 100% |████████████████████████████████| 2.2MB 93kB/s Requirement already satisfied: numpy in /Users/fernando/miniconda3/lib/python3.6/site-packages (from biopython) (1.14.5) Installing collected packages: biopython Successfully installed biopython-1.72

efsalcedo commented 6 years ago

biophyton after reinstalling biophyton I now have another error message:

python repeatHMM.py IonCode_0109.bam Traceback (most recent call last): File "repeatHMM.py", line 17, in from scripts import myBAMhandler File "/Applications/RepeatHMM-master/bin/scripts/myBAMhandler.py", line 24, in import getAlignment ModuleNotFoundError: No module named 'getAlignment'

liuqianhn commented 6 years ago

Hi @efsalcedo , may I know whether you are running RepeatHMM using python 3? Python 2 and python 3 have different 'import' default setting. You can try python 2 if possible. RepeatHMM works well in python 2 now.

efsalcedo commented 6 years ago

Hi liuqianhn,

bin> python --version Python 3.6.5 :: Anaconda, Inc.

efsalcedo commented 6 years ago

I am new using Python, how can I choose to use version 2? I'm on a Mac with HighSierra and I had problems installing Python 2

liuqianhn commented 6 years ago

Hi @efsalcedo , before I make RepeatHMM run on python3, I suggest you build an individual virtual environment for python 2.7. Please feel free to read issue 7 for how to build an independent python 2.7 for running RepeatHMM. It is easy.

Please note that in issue 7, it assumed that 'Anaconda' is not installed. Since you already have installed 'Anaconda', you can start from "2. have your own python 2.7" directly.

efsalcedo commented 6 years ago

Hi liuqianhn,

I followed the instructions of issue 7 from step 2 but now other errors appear on the screen when executing the script

Traceback (most recent call last): File "repeatHMM.py", line 17, in from scripts import myBAMhandler File "/Applications/RepeatHMM-master/bin/scripts/myBAMhandler.py", line 24, in import getAlignment File "/Applications/RepeatHMM-master/bin/scripts/getAlignment.py", line 7, in from UnsymmetricPairAlignment import UnsymmetricPairAlignment File "/Applications/RepeatHMM-master/bin/scripts/UnsymmetricPairAlignment/UnsymmetricPairAlignment.py", line 17, in _UnsymmetricPairAlignment = swig_import_helper() File "/Applications/RepeatHMM-master/bin/scripts/UnsymmetricPairAlignment/UnsymmetricPairAlignment.py", line 16, in swig_import_helper return importlib.import_module('_UnsymmetricPairAlignment') File "/Users/fernando/miniconda3/envs/py27/lib/python2.7/importlib/init.py", line 37, in import_module import(name) ImportError: No module named _UnsymmetricPairAlignment

liuqianhn commented 6 years ago

Hi @efsalcedo , do you go to the folder "bin/scripts/UnsymmetricPairAlignment", and type 'make' to successfully compile the '.c' file?

efsalcedo commented 6 years ago

Hi liuqianhn ,

You are right, I forgot to compile the program again after changing the Python version. Now the program runs, however, I have one doubt. I am trying to find the number of CAG repeats in BAM files, in which the hg38 was used. I'm not sure how the final parameter (--hgfile xx.fa) should have been in the command I'm using. Should I have an additional file with this genome?

I'm using: python repeatHMM.py BAMinput --Onebamfile IonCode_0109.bam --repeatName CAG --hgfile xx.fa

liuqianhn commented 6 years ago

Hi @efsalcedo , if your BAM files is aligned with hg38, it would be better to provide the path to hg38 in the last parameter( for example, "--hgfile /aa/bb/cc/dd/hg38.fa"), since the default path to hgfile is "./mhgversion/hg38.fa" which would not be correct in your system.

liuqianhn commented 6 years ago

Hi @efsalcedo , "--repeatName" is not CAG here, and "--repeatName" is the locus of your interest, for example "HTT" or "atxn1". You can find more in "bin/reference_sts/hg38/hg38.predefined.pa".

efsalcedo commented 6 years ago

liuqianhn I am interested in this specific position. What parameters can I use?

ar,chrX,67545318,67545386,CAG,+22,9-36:38-62,

liuqianhn commented 6 years ago

Hi @efsalcedo, if you are interested in ‘ar’ , you can simply use ‘ar’ for ‘repeatName’ given ‘ar’ is in hg38.predefined.fa.

efsalcedo commented 6 years ago

Hi liuqianhn,

I think it's my last question. ¿If I want to find another region that is not included in the "hg38.predefined.fa" file, simply I must include it in this same file in the same format?

Thanks in advance.

liuqianhn commented 6 years ago

Hi @efsalcedo , It is ok.

If you are interested in a region defined by yourself, you can (i) either add it in "bin/reference_sts/hg38/hg38.predefined.pa", or (ii) use "--UserDefinedRepeat chr/start_pos/end_pos/repeat_pattern/strand/range/others/" where the first five are necessary while the remain (range and others) could be empty.

Feel free to let me know if anything I can help.

efsalcedo commented 6 years ago

HI liuqianhn, Yo estoy usando el siguiente comando, pero me resulta un mensaje de error. ¿Yo debo usar un archivo BAI?, si es así, ¿cómo debo incluirlo dentro del comando?

python repeatHMM.py BAMinput --Onebamfile IonCode_0109.bam --repeatName ar --hgfile hg38.fa

Warning!!!!!!!! the input BAM file not indexed /Users/fernando/CURSO_INCAN_Junio_3-4_2018/Sequences/IonCode_0109.bam [main_samview] random alignment retrieval only works for indexed BAM or CRAM files. [main_samview] random alignment retrieval only works for indexed BAM or CRAM files. ('ERROR None detection (sp)', 'ar', ['chrX', 67545318, 67545386, 'CAG', '+22', '9-36:38-62', '']) p2sp end---running time0 mem56720

liuqianhn commented 6 years ago

Hi @efsalcedo , please used "samtools index IonCode_0109.bam" before run RepeatHMM.

efsalcedo commented 6 years ago

Hi liuqianhn,

¿Is it necessary to indicate the location of the BAI file? I am receiving these messages after running the command? [E::bwa_idx_load_from_disk] fail to locate the index files [main_samview] region "chrX:67544318-67546386" specifies an unknown reference name. Continue anyway. [main_samview] region "X:67544318-67546386" specifies an unknown reference name. Continue anyway. p2sp end---running time307 mem76660

  p2sp ['ar', 23.0, [0, 0], 'allocr:', 0, 66421]

p2sp

ar 0 0;

('The result is in', 'logbam/RepBAM_ar.gmm_GapCorrection1_FlankLength30_SplitAndReAlign1_2_7_4_80_10_100_hg38_comp_I0.120_D0.020_S0.020.log')

liuqianhn commented 6 years ago

Hi @efsalcedo , it does not need to provide location of bai file. Usually, bai file is at the same folder of bam, and like "xxxx.bam.bai" for "xxxx.bam". Under you running RepeatHMM folder, you can run samtools view IonCode_0109.bam chrX:67544318-67546386 to see whether you can run it successfully.

efsalcedo commented 6 years ago

Hi liuqianhn,

What document can I consult to understand how to interpret the results? I do not understand if this means that no repetition was found:

p2sp ['ar', 23.0, [0, 0], 'allocr:', 0, 66421]

p2sp ar 0 0;

liuqianhn commented 6 years ago

Hi @efsalcedo , it seems that repeat could not find here. For the interpretation of the results, please refer to here. May I know how about the coverage of your bam?

liuqianhn commented 6 years ago

Closed due to no new activity. Feel free to re-open it when you need more help.