Closed efsalcedo closed 6 years ago
Hi @efsalcedo , it seems that "biopython" cannot be found. Do you install "biopython", or is "biopython" installed folder is in the python-package searching path?
Hi liuqianhn, Thanks for your quick response. I imagine that biopython was installed with the following command that I executed: pip install peakutils==1.0.3 hmmlearn sklearn biopython
I'm trying to re-install:
pip install biopython Collecting biopython Cache entry deserialization failed, entry ignored Downloading https://files.pythonhosted.org/packages/6a/22/c5b6e425d7ed86a52fe10be670b95513b43e0853908d70a984d9a68a9945/biopython-1.72-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (2.2MB) 100% |████████████████████████████████| 2.2MB 93kB/s Requirement already satisfied: numpy in /Users/fernando/miniconda3/lib/python3.6/site-packages (from biopython) (1.14.5) Installing collected packages: biopython Successfully installed biopython-1.72
biophyton after reinstalling biophyton I now have another error message:
python repeatHMM.py IonCode_0109.bam
Traceback (most recent call last):
File "repeatHMM.py", line 17, in
Hi @efsalcedo , may I know whether you are running RepeatHMM using python 3? Python 2 and python 3 have different 'import' default setting. You can try python 2 if possible. RepeatHMM works well in python 2 now.
Hi liuqianhn,
bin> python --version Python 3.6.5 :: Anaconda, Inc.
I am new using Python, how can I choose to use version 2? I'm on a Mac with HighSierra and I had problems installing Python 2
Hi @efsalcedo , before I make RepeatHMM run on python3, I suggest you build an individual virtual environment for python 2.7. Please feel free to read issue 7 for how to build an independent python 2.7 for running RepeatHMM. It is easy.
Please note that in issue 7, it assumed that 'Anaconda' is not installed. Since you already have installed 'Anaconda', you can start from "2. have your own python 2.7" directly.
Hi liuqianhn,
I followed the instructions of issue 7 from step 2 but now other errors appear on the screen when executing the script
Traceback (most recent call last):
File "repeatHMM.py", line 17, in
Hi @efsalcedo , do you go to the folder "bin/scripts/UnsymmetricPairAlignment", and type 'make' to successfully compile the '.c' file?
Hi liuqianhn ,
You are right, I forgot to compile the program again after changing the Python version. Now the program runs, however, I have one doubt. I am trying to find the number of CAG repeats in BAM files, in which the hg38 was used. I'm not sure how the final parameter (--hgfile xx.fa) should have been in the command I'm using. Should I have an additional file with this genome?
I'm using: python repeatHMM.py BAMinput --Onebamfile IonCode_0109.bam --repeatName CAG --hgfile xx.fa
Hi @efsalcedo , if your BAM files is aligned with hg38, it would be better to provide the path to hg38 in the last parameter( for example, "--hgfile /aa/bb/cc/dd/hg38.fa"), since the default path to hgfile is "./mhgversion/hg38.fa" which would not be correct in your system.
Hi @efsalcedo , "--repeatName" is not CAG here, and "--repeatName" is the locus of your interest, for example "HTT" or "atxn1". You can find more in "bin/reference_sts/hg38/hg38.predefined.pa".
liuqianhn I am interested in this specific position. What parameters can I use?
ar,chrX,67545318,67545386,CAG,+22,9-36:38-62,
Hi @efsalcedo, if you are interested in ‘ar’ , you can simply use ‘ar’ for ‘repeatName’ given ‘ar’ is in hg38.predefined.fa.
Hi liuqianhn,
I think it's my last question. ¿If I want to find another region that is not included in the "hg38.predefined.fa" file, simply I must include it in this same file in the same format?
Thanks in advance.
Hi @efsalcedo , It is ok.
If you are interested in a region defined by yourself, you can (i) either add it in "bin/reference_sts/hg38/hg38.predefined.pa", or (ii) use "--UserDefinedRepeat chr/start_pos/end_pos/repeat_pattern/strand/range/others/" where the first five are necessary while the remain (range and others) could be empty.
Feel free to let me know if anything I can help.
HI liuqianhn, Yo estoy usando el siguiente comando, pero me resulta un mensaje de error. ¿Yo debo usar un archivo BAI?, si es así, ¿cómo debo incluirlo dentro del comando?
python repeatHMM.py BAMinput --Onebamfile IonCode_0109.bam --repeatName ar --hgfile hg38.fa
Warning!!!!!!!! the input BAM file not indexed /Users/fernando/CURSO_INCAN_Junio_3-4_2018/Sequences/IonCode_0109.bam [main_samview] random alignment retrieval only works for indexed BAM or CRAM files. [main_samview] random alignment retrieval only works for indexed BAM or CRAM files. ('ERROR None detection (sp)', 'ar', ['chrX', 67545318, 67545386, 'CAG', '+22', '9-36:38-62', '']) p2sp end---running time0 mem56720
Hi @efsalcedo , please used "samtools index IonCode_0109.bam" before run RepeatHMM.
Hi liuqianhn,
¿Is it necessary to indicate the location of the BAI file? I am receiving these messages after running the command? [E::bwa_idx_load_from_disk] fail to locate the index files [main_samview] region "chrX:67544318-67546386" specifies an unknown reference name. Continue anyway. [main_samview] region "X:67544318-67546386" specifies an unknown reference name. Continue anyway. p2sp end---running time307 mem76660
p2sp ['ar', 23.0, [0, 0], 'allocr:', 0, 66421]
p2sp
ar 0 0;
('The result is in', 'logbam/RepBAM_ar.gmm_GapCorrection1_FlankLength30_SplitAndReAlign1_2_7_4_80_10_100_hg38_comp_I0.120_D0.020_S0.020.log')
Hi @efsalcedo , it does not need to provide location of bai file. Usually, bai file is at the same folder of bam, and like "xxxx.bam.bai" for "xxxx.bam". Under you running RepeatHMM folder, you can run samtools view IonCode_0109.bam chrX:67544318-67546386
to see whether you can run it successfully.
Hi liuqianhn,
What document can I consult to understand how to interpret the results? I do not understand if this means that no repetition was found:
p2sp ['ar', 23.0, [0, 0], 'allocr:', 0, 66421]
p2sp ar 0 0;
Hi @efsalcedo , it seems that repeat could not find here. For the interpretation of the results, please refer to here. May I know how about the coverage of your bam?
Closed due to no new activity. Feel free to re-open it when you need more help.
I installed the program according to the specifications, now I'm trying to run the repeatHMM.py script but I receive an error message regarding a module of the included scripts.
python repeatHMM.py /Users/fernando/CURSO_INCAN_Junio_3-4_2018/Sequences/IonCode.bam
Traceback (most recent call last): File "repeatHMM.py", line 17, in
from scripts import myBAMhandler
File "/Applications/RepeatHMM-master/bin/scripts/myBAMhandler.py", line 21, in
from Bio import pairwise2
ModuleNotFoundError: No module named 'Bio