boscoh / inmembrane

BSD 2-Clause "Simplified" License
9 stars 10 forks source link

Inmembrane locally installed TMHMM, SignalP, LipoP #4

Closed Dimpledavray287 closed 4 years ago

Dimpledavray287 commented 4 years ago

Hi Boscoh I am interested to identify surface exposed protein in Lactobacillus by using inmembrane tool. I have appox 7000 sequences and for this i need locally install TMHMM, SignalP, LipoP. Which have done. I don't know where to change path in the inmembrane files so that all the web services TMHMM, SignalP, LipoP will be use locally through inmembrane programs

Thanks Dimple

pansapiens commented 4 years ago

If you run inmembrane_scan once (with no extra args) it will generate a default config file inmembrane.config in the current working directory. You can edit this to change paths the various tools (eg the signalp4_bin, lipop1_bin and tmhmm_bin variables).

It works best if you put the executables for each tool on your PATH environment variable and just use eg 'signalp4_bin': 'signalp', in the inmembrane.config file.

Dimpledavray287 commented 4 years ago

Thanks @pansapiens for your prompt reply.

I had put the executables for each tool on the PATH environment variable. When i used it for low sequence (below 500) it worked fine (Output1 mentioned below) . When i increasing the number of sequence above 1000. It throw error (Output2 mention below)

Command used to move or copy the executable flle in the PATH environment A) SignalP I had transferred all executable of signalp 4.1 which were present inside 'bin' Command :sudo mv nnhowplayer.Linux_x86_64 /usr/local/bin/nnhowplayer.Linux_x86_64 ((Likewise i had moved all the nnhowplayer)

B) tmhmm bin/decodeanhmm Binary executable Command : sudo mv decodeanhmm.Linux_x86_64 /usr/local/bin/decodeanhmm.Linux_x86_64 (Likewise i had moved all the decodeanhmm)

C) LipoP :Similarly all the files from LipoP to /usr/local/bin

Output1 : -

 Number of proteins in each class:
# CYTOPLASM(non-PSE)    379
# MEMBRANE(non-PSE) 87
# PSE(total)        32
#   PSE-Cellwall    7
#   PSE-Lipoprotein 8
#   PSE-Membrane    17
# SECRETED          13
# 
# Output written to /home/dimple/Phd/inmembrane-0.95.0/Trylatest5.csv
# 
# This run used SignalP 4.1, LipoP 1.0 (web interface), HMMER 3.0, TMHMM 2.0.
# References have been written to /home/dimple/Phd/inmembrane-0.95.0/Trylatest5/citations.txt 
# - please cite as appropriate.

Output2 : -

dimple@dimple-VirtualBox[inmembrane-0.95.0] inmembrane_scan Trylatest6.txt  
/home/dimple/.local/lib/python2.7/site-packages/BeautifulSoup.py:114: UserWarning: You are using a very old release of Beautiful Soup, last updated in 2011. If you installed the 'beautifulsoup' package through pip, you should know the 'beautifulsoup' package name is about to be reclaimed by a more recent version of Beautiful Soup which is incompatible with this version.

This will happen at some point after January 1, 2021.

If you just started this project, this is easy to fix. Install the 'beautifulsoup4' package instead of 'beautifulsoup' and start using Beautiful Soup 4.

If this is an existing project that depends on Beautiful Soup 3, the project maintainer (potentially you) needs to start the process of migrating to Beautiful Soup 4. This should be a relatively easy part of the Python 3 migration.

  """)
# inmembrane 0.95.0 (https://github.com/boscoh/inmembrane)
# Loading existing inmembrane.config
# SignalP(scrape_web), input.fasta > signalp_scrape_web.out
Traceback (most recent call last):
  File "/usr/local/bin/inmembrane_scan", line 87, in <module>
    inmembrane.process(params)
  File "/usr/local/lib/python2.7/dist-packages/inmembrane/__init__.py", line 139, in process
    plugin.annotate(params, proteins)
  File "/usr/local/lib/python2.7/dist-packages/inmembrane/plugins/signalp_scrape_web.py", line 111, in annotate
    pollingurl = soup.findAll('a')[0]['href']
IndexError: list index out of range
dimple@dimple-VirtualBox[inmembrane-0.95.0]
pansapiens commented 4 years ago

From the error, it looks as if it's using the SignalP web service rather than the locally installed version. You should check that 'signalp4_bin': 'signalp' is set in your inmembrane.config and that there is no signalp_scrape_web in the config.

If you post your inmembrane.config here it might help diagnose.

Dimpledavray287 commented 4 years ago

You may right. I have posted inmenbrane.config file.

{
  'fasta': '',
  'csv': '',
  'out_dir': '',
  'protocol': 'gram_pos', # 'gram_neg'

#### Signal peptide and transmembrane helix prediction
#   'signalp4_bin': 'signalp',
  'signalp4_bin': 'signalp_scrape_web',
#   'lipop1_bin': 'LipoP',
  'lipop1_bin': 'lipop_scrape_web',
#   'tmhmm_bin': 'tmhmm',
  'tmhmm_bin': 'tmhmm_scrape_web',
   'memsat3_bin': 'runmemsat',
  'helix_programs': ['tmhmm'],
# 'helix_programs': ['tmhmm', 'memsat3'],
  'terminal_exposed_loop_min': 50, # unused in gram_neg protocol
  'internal_exposed_loop_min': 100, # try 30 for gram_neg

#### Sequence similarity and motif prediction
  'hmmsearch3_bin': 'hmmsearch',
  'hmm_evalue_max': 0.1,
  'hmm_score_min': 10,

#### Outer membrane beta-barrel predictors
  'barrel_programs': ['tmbetadisc-rbf'],
# 'barrel_programs': ['bomp', 'tmbetadisc-rbf'],
  'bomp_clearly_cutoff': 3, # if >= than this, always classify as an OM(barrel)
  'bomp_maybe_cutoff': 1, # must also have a signal peptide to be OM(barrel)
  'tmbetadisc_rbf_method': 'aadp', # aa, dp, aadp or pssm
}
Dimpledavray287 commented 4 years ago

I am not sure but two possibilities

1) we should do change in the inmembrane_scan file . Because it loading all the file present in the plugin directory and directory have both signalp_scrape_web.py and signalp4.py.

import inmembrane
# from inmembrane import helpers
from inmembrane.helpers import *
# will load all plugins in the plugins/ directory
from inmembrane.plugins import *
import unittest 

2) executables tool not properly added on the PATH environment variable.

imple@dimple-VirtualBox[inmembrane] echo $PATH                      
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/lib/cd-hit:/usr/lib/cd-hit

dimple@dimple-VirtualBox[inmembrane] cd /usr/local/bin               
dimple@dimple-VirtualBox[bin] ls                                     
aaindexextract             maskfeat
abiview                    maskseq
acdc                       matcher
acdgalaxy                  mb-multi
acdlog                     megamerger
acdpretty                  melt.pl
acdtable                   merger
acdtrace                   mesquite
acdvalid                   MImapqtl
ace_contig_coverage.pl     mrbayes
ace_split.pl               mrbayes-multi
act                        msatfinder
aligncopy                  msbar
aligncopypair              mspcrunch
alimask                    MSPcrunch.LIN
antigenic                  MultiRegress
archaeopteryx              mview
art                        mwcontam
artemis                    mwfilter
assemblyget                nd_clip
backtranambig              needle
backtranseq                needleall
banana                     newcpgreport
big_blast                  newcpgseek
big-blast                  newseq
big_blast.pl               nhmmer
big-blast.pl               nhmmscan
biosed                     nnhowplayer.Linux_i386
blast2sam                  nnhowplayer.Linux_i486
blixem                     nnhowplayer.Linux_i586
blixem.LIN                 nnhowplayer.Linux_i686
bowtie2sam                 nnhowplayer.Linux_ia64
BTmapqtl                   nnhowplayer.Linux_x86_64
btwisted                   nohtml
cachedas                   noreturn
cachedbfetch               nospace
cacheebeyesearch           notab
cacheensembl               notseq
caf2ace                    novo2sam
caf2fastq                  nrdb
caf2gap                    nrdb.linux
caf2phrap                  nrdb.linux-x86
caf_build_consensus        nthseq
cafcat                     nthseqset
caf_check_pads             ocount
caf_depad                  octanol
cafmerge                   oddcomp
caf_pad                    oligoarray
cai                        oligoarray_cl
cap3                       om
catchall                   omdecode
chaos                      omegamap
charge                     omegaMap
checktrans                 omegamaptp
chips                      omegaMapTP
cirdna                     omorder
clxcoarse                  ompermute
clxdo                      omsummarize
cn3d                       omTP
codcmp                     ontocount
codcopy                    ontoget
coderet                    ontogetcommon
compseq                    ontogetdown
cons                       ontogetobsolete
consambig                  ontogetroot
cpgplot                    ontogetsibs
cpgreport                  ontogetup
create_pan_genome          ontoisobsolete
create_pan_genome_plots.R  ontotext
ct2rnaml                   palindrome
ct-energy                  pan_genome_assembly_statistics
cusp                       pan_genome_core_alignment
cutgextract                pan_genome_post_analysis
cutseq                     pan_genome_reorder_spreadsheet
cytoscape                  parallel_all_against_all_blastp
cytoscape.sh               pasteseq
dan                        patmatdb
dbiblast                   patmatmotifs
dbifasta                   pepcoil
dbiflat                    pepdigest
dbigcg                     pepinfo
dbtell                     pepnet
dbxcompress                pepstats
dbxedam                    pepwheel
dbxfasta                   pepwindow
dbxflat                    pepwindowall
dbxgcg                     phmmer
dbxobo                     phrapcons
dbxreport                  pip
dbxresource                pip2
dbxstat                    pip2.7
dbxtax                     plotcon
dbxuncompress              plotorf
decodeanhmm.Linux_i386     polydot
decodeanhmm.Linux_i486     preg
decodeanhmm.Linux_i586     Preplot
decodeanhmm.Linux_i686     prettyplot
decodeanhmm.Linux_ia64     prettyseq
decodeanhmm.Linux_x86_64   priam
degapseq                   primersearch
dendroscope                printsextract
density                    profit
descseq                    prophecy
diffseq                    prophet
distmat                    prosextract
dotmatcher                 protein_alignment_from_nucleotides
dotpath                    prove
dotter                     Prune
dotter.LIN                 pscan
dottup                     psiphi
dreg                       psl2sam
drfinddata                 Qstats
drfindformat               query_pan_genome
drfindid                   rbs_finder
drfindresource             Rcross
drget                      rebaseextract
drtext                     recoder
edamdef                    redata
edamhasinput               refseqget
edamhasoutput              remap
edamisformat               restover
edamisid                   restrict
edamname                   revseq
edialign                   Rmap
einverted                  roary
Emap                       roary-create_pan_genome_plots.R
embossdata                 roary-pan_genome_reorder_spreadsheet
embossupdate               roary-query_pan_genome
embossversion              roary-unique_genes_per_sample
emma                       roche2gap
emowse                     roche454ace2gap
entret                     roche454ace2gap.sh
envpath                    Rqtl
epestfind                  runJemboss.sh
eprimer3                   sam2vcf
eprimer32                  seealso
Eqtl                       seqcount
equicktandem               seqmatchall
est2genome                 seqret
etandem                    seqretsetall
export2sam                 seqretsplit
extractalign               seqxref
extractfeat                seqxrefget
extract_proteome_from_gff  servertell
extractseq                 showalign
fasta                      showdb
fasta36_t                  showfeat
fastf                      showorf
fastf36_t                  showpep
fastm                      showseq
fastm36_t                  showserver
fastqc                     shuffleseq
fasts                      sigcleave
fasts36_t                  signalp
fastx                      signalp.1
fastx36_t                  signalp-4.1
fasty                      silent
fasty36_t                  sirna
featcopy                   sixpack
featmerge                  sizeseq
featreport                 skipredundant
feattext                   skipseq
findkm                     soap2sam
findrule                   splitsource
fix_quals                  splitstree
formcon                    splitter
freak                      squint
fuzznuc                    SRmapqtl
fuzzpro                    ss-count.pl
fuzztran                   ssearch
gap2caf                    ssearch36_t
garnier                    stars
geecee                     stars-setup
getorf                     startkde
ggsearch                   stretcher
ggsearch36_t               stssearch
glimmer3                   supermatcher
glsearch                   syco
glsearch36_t               taxget
godef                      taxgetdown
goname                     taxgetrank
happy                      taxgetspecies
helixturnhelix             taxgetup
hmmalign                   taxinspector
hmmbuild                   tbl2asn
hmmconvert                 tcode
hmmemit                    tetra
hmmfetch                   textget
hmmlogo                    textsearch
hmmpgmd                    tfastf
hmmpress                   tfastf36_t
hmmscan                    tfastm
hmmsearch                  tfastm36
hmmsim                     tfasts
hmmstat                    tfasts36_t
hmoment                    tfastx
h-num.pl                   tfastx36_t
hybrid-min                 tfasty
hybrid-ss-min              tfasty36_t
iep                        tfextract
infoalign                  tfm
infoassembly               tfscan
infobase                   tmap
inforesidue                tmhmm-2.0c
infoseq                    tranalign
inmembrane_scan            transeq
interpolate_sam            transfer_annotation_to_groups
isochore                   treeview
iterative_cdhit            trimest
jackhmmer                  trimseq
jaspextract                trimspace
jaspscan                   twofeat
jembossctl                 union
jmotu                      urlget
JZmapqtl                   variationget
l4p-tmpl                   vectorstrip
lalign                     water
lav2ps                     wgsim_eval
lav2svg                    whichdb
lindna                     wobble
LipoP                      wordcount
LipoP1.0a                  wordfinder
LipoP1.0a.html             wordmatch
LipoP1.0.mod               wossdata
lipop_decode               wossinput
LipoPformat                wossname
listor                     wossoperation
long-orfs                  wossoutput
LRmapqtl                   wossparam
makehmmerdb                wosstopic
makenucseq                 xmlget
makeprotseq                xmltext
map_db                     yank
marscan                    yapp
maskambignuc               Zmapqtl
maskambigprot              zoom2sam
dimple@dimple-VirtualBox[bin] 
pansapiens commented 4 years ago

It looks like you do need to change your inmembrane.config file - remove any of the lines with _scrape_web.

The section with the signalp/tmhmm/lipop settings should look like:

  'signalp4_bin': 'signalp',
  'lipop1_bin': 'LipoP',
  'tmhmm_bin': 'tmhmm-2.0c',

Note the executable name for tmhmm, based on what you have in /usr/local/bin, is tmhmm-2.0c. Putting the full path (/usr/local/bin/tmhmm-2.0c) in the config file should work too, if I remember correctly.

The binaries for the tools are in your PATH, so that part looks fine. Also ensure you've followed the SignalP etc install instructions - they also require copying the content of lib to /usr/local/lib (see http://www.cbs.dtu.dk/services/doc/signalp-5.0.readme, or the README in the SignalP tarball - seems the 4.1 readme link is dead now).

To be sure, you can try running each tool on it's own with a small FASTA file to make sure they are installed correctly (the inmembrane_scan -t -n command tests that all the locally installed tools run as expected).

No need to modify inmembrane_scan - all the plugins are loaded at startup, but only the ones set via inmembrane.config are actually used during the protocol.

Dimpledavray287 commented 4 years ago

Thankyou very much @pansapiens . It worked .....

Number of proteins in each class:
# CYTOPLASM(non-PSE)    5040
# MEMBRANE(non-PSE) 1187
# PSE(total)        513
#   PSE-Cellwall    108
#   PSE-Membrane    405
# SECRETED          264
# 
# Output written to /home/dimple/Phd/inmembrane-0.95.0/Hypothetical.csv
# 
# This run used SignalP 4.0, LipoP 1.0, HMMER 3.0, TMHMM 2.0.
# References have been written to /home/dimple/Phd/inmembrane-0.95.0/Hypothetical/citations.txt 
# - please cite as appropriate.
dimple@dimple-VirtualBox[inmembrane-0.95.0]
pansapiens commented 4 years ago

Great !