eead-csic-compbio / get_homologues

GET_HOMOLOGUES: a versatile software package for pan-genome analysis
Other
104 stars 25 forks source link

error locating Pfam-A.hmm after successful install #3

Closed jbadomics closed 8 years ago

jbadomics commented 8 years ago

I successfully downloaded and installed the latest get_homologues release:

$ perl install.pl 

### 1) checking required parts: 

## checking mcl-14-137 (lib/phyTools: $ENV{'EXE_MCL'})
>> OK
## checking COGsoft/COGreadblast (lib/phyTools: $ENV{'EXE_READBLAST'})
>> OK
## checking COGsoft/COGtriangles (lib/phyTools: $ENV{'EXE_COGTRI'})
>> OK
## checking COGsoft/COGmakehash (lib/phyTools: $ENV{'EXE_MAKEHASH'})
>> OK
## checking COGsoft/COGlse (lib/phyTools: $ENV{'EXE_COGLSE'})
>> OK
## Checking blast (lib/phyTools: $ENV{'EXE_BLASTP'})
>> OK

### 2) checking optional parts: 

## checking optional HMMER binaries (lib/phyTools: $ENV{'EXE_HMMPFAM'})
# required by get_homologues.pl -D
>> OK
## checking optional PFAM library (lib/phyTools: $ENV{'PFAMDB'})
# required by get_homologues.pl -D and get_homologues-est.pl -D
# cannot locate Pfam-A, would you like to download it now? [Y/n]
Y
# connecting to ftp.ebi.ac.uk ...
# downloading ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release//Pfam-A.hmm.gz (242.8Mb) ...
# [        50%       ]
# ####################

# gunzip Pfam-A.hmm.gz ...
# pressing Pfam-A.hmm ...
Working...    done.
Pressed and indexed 16295 HMMs (16295 names and 16295 accessions).
Models pressed into binary file:   Pfam-A.hmm.h3m
SSI index for binary model file:   Pfam-A.hmm.h3i
Profiles (MSV part) pressed into:  Pfam-A.hmm.h3f
Profiles (remainder) pressed into: Pfam-A.hmm.h3p
>> OK
## checking optional SWISSPROT library (lib/phyTools: $ENV{'BLASTXDB'})
# required by transcripts2cds.pl and transcripts2cdsCPP.pl
# cannot locate SWISSPROT, would you like to download it now? [Y/n]
Y
# connecting to ftp.ncbi.nih.gov ...
# downloading ftp://ftp.ncbi.nih.gov//blast/db//swissprot.tar.gz (117.0Mb) ...
# [        50%       ]
# ####################

# untar swissprot.tar.gz ...
>> OK
## checking optional software R (lib/phyTools: $ENV{'EXE_R'})
# required by compare_clusters.pl, parse_pangenome_matrix.pl -s, plot_pancore_matrix.pl
>> OK
## checking optional Perl module GD
# required by parse_pangenome_matrix.pl -p
>> OK

### 3) Your get_homologues kit is now fully functional

but when I run get_homologues.pl on a small dataset I get an error:

$ ./get_homologues.pl -n 4 -d /data/test_geo_cyto/ -t 0 -D -A -z -M
# ./get_homologues.pl -i 0 -d /data/test_geo_cyto -o 0 -e 0 -f 0 -r 0 -t 0 -c 0 -z 0 -I 0 -m local -n 4 -M 1 -G 0 -P 0 -C 75 -S 1 -E 1e-05 -F 1.5 -N 0 -B 50 -b 0 -s 0 -D 1 -g 0 -a '0' -x 0 -R 0 -A 1

# results_directory=/sw/get_homologues-macosx-20160113/test_geo_cyto_homologues
# parameters: MAXEVALUEBLASTSEARCH=0.01 MAXPFAMSEQS=5000 BATCHSIZE=100 KEEPSCNDHSPS=1

# checking input files...
# Geobacter_metallireducens_GS-15.cytochromes.concatenated.faa 58
# Geobacter_pickeringii_G13.cytochromes.faa 58
# Geobacter_soli.cytochromes.concatenated.faa 58
# Geobacter_sulfurreducens_PCA.cytochromes.faa 72

#4 genomes, 246 sequences

# taxa considered = 4 sequences = 246 residues = 119322 MIN_BITSCORE_SIM = 16.2

# mask=GeobactermetallireducensGS-15_f0_0taxa_algOMCL_Pfam_e0_ (_algOMCL_Pfam)

# submitting Pfam HMMER jobs ... 
# ERROR: cannot find database file /sw/get_homologues-macosx-20160113/db/Pfam-A.hmm
# EXIT: failed while running localPfam search (/sw/get_homologues-macosx-20160113/_split_hmmscan.pl 4 100 /sw/get_homologues-macosx-20160113//bin/hmmer-3.1b2/binaries/hmmscan --noali --acc --cut_ga  --cpu 1 /sw/get_homologues-macosx-20160113/db/Pfam-A.hmm  /sw/get_homologues-macosx-20160113/test_geo_cyto_homologues/_Geobacter_metallireducens_GS-15.cytochromes.concatenated.faa.fasta0 > /sw/get_homologues-macosx-20160113/test_geo_cyto_homologues/_Geobacter_metallireducens_GS-15.cytochromes.concatenated.faa.fasta0.pfam )

Examining the get_homologues-macosx-20160113/db directory, Pfam-A.hmm.gz and Pfam-A.hmm are not present.

I was able to resolve the issue by re-downloading Pfam-A.hmm.gz and manually unzipping the file in get_homologues-macosx-20160113/db

cd get_homologues-macosx-20160113/db
wget ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.gz
gunzip -c Pfam-A.hmm.gz > Pam-A.hmm

After doing this get_homologues completed successfully re-running the same command. Is the install.pl script mistakenly removing Pfam-A.hmm from get_homologues-macosx-20160113/db?

Here is the output of get_homologues.pl (OS X 10.11, hmmer 3.1 and blast+/legacy blast installed separately)

$ ./get_homologues.pl -v

./get_homologues.pl version 2.0 (2015)

Program written by Bruno Contreras-Moreira (1) and Pablo Vinuesa (2).

 1: http://www.eead.csic.es/compbio (Estacion Experimental Aula Dei/CSIC/Fundacion ARAID, Spain)
 2: http://www.ccg.unam.mx/~vinuesa (Center for Genomic Sciences, UNAM, Mexico)

Primary citation (PubMed:24096415):

 Contreras-Moreira B, Vinuesa P. (2013) GET_HOMOLOGUES, a versatile software package for scalable and
 robust microbial pangenome analysis. Appl Environ Microbiol 79(24):7696-701. doi: 10.1128/AEM.02411-13

This software employs code, binaries and data from different authors, please cite them accordingly:
 OrthoMCL v1.4 (www.orthomcl.org , PubMed:12952885)
 COGtriangles v2.1 (sourceforge.net/projects/cogtriangles , PubMed=20439257)
 NCBI Blast-2.2 (blast.ncbi.nlm.nih.gov , PubMed=9254694,20003500)
 Bioperl v 1.5.2 (www.bioperl.org , PubMed=12368254)
 HMMER 3.1b2 (hmmer.org)
 Pfam (pfam.sanger.ac.uk , PubMed=24288371)

Checking required binaries and data sources, all set in phyTools.pm :
        EXE_BLASTP : OK (path:/sw/get_homologues-macosx-20160113/bin/ncbi-blast-2.2.27+/bin/blastp)
        EXE_BLASTN : OK (path:/sw/get_homologues-macosx-20160113/bin/ncbi-blast-2.2.27+/bin/blastn)
      EXE_FORMATDB : OK (path:/sw/get_homologues-macosx-20160113/bin/ncbi-blast-2.2.27+/bin/makeblastdb)
           EXE_MCL : OK (path:/sw/get_homologues-macosx-20160113//bin/mcl-14-137/src/shmcl/mcl)
      EXE_MAKEHASH : OK (path:/sw/get_homologues-macosx-20160113//bin/COGsoft/COGmakehash/COGmakehash )
     EXE_READBLAST : OK (path:/sw/get_homologues-macosx-20160113//bin/COGsoft/COGreadblast/COGreadblast )
        EXE_COGLSE : OK (path:/sw/get_homologues-macosx-20160113//bin/COGsoft/COGlse/COGlse )
        EXE_COGTRI : OK (path:/sw/get_homologues-macosx-20160113//bin/COGsoft/COGtriangles/COGtriangles )
       EXE_HMMPFAM : OK (/sw/get_homologues-macosx-20160113//bin/hmmer-3.1b2/binaries/hmmscan --noali --acc --cut_ga  /sw/get_homologues-macosx-20160113/db/Pfam-A.hmm)
        EXE_INPARA : OK (path:/sw/get_homologues-macosx-20160113/_cluster_makeInparalog.pl)
         EXE_ORTHO : OK (path:/sw/get_homologues-macosx-20160113/_cluster_makeOrtholog.pl)
         EXE_HOMOL : OK (path:/sw/get_homologues-macosx-20160113/_cluster_makeHomolog.pl)
    EXE_SPLITBLAST : OK (path:/sw/get_homologues-macosx-20160113/_split_blast.pl)
  EXE_SPLITHMMPFAM : OK (path:/sw/get_homologues-macosx-20160113/_split_hmmscan.pl)
eead-csic-compbio commented 8 years ago

Thanks again Jon for reporting this bug, a fix was added to _split_hmmscan.pl as this only affected local (as opposed to cluster) jobs. This issue is thus fixed from release v2.0.4 on.