ebi-pf-team / interproscan

Genome-scale protein function classification
Apache License 2.0
303 stars 67 forks source link

Error in running Interpro using test files #6

Closed lilipeng closed 8 years ago

lilipeng commented 8 years ago

Hi,

I've installed Interpro on my Mac OS X and have attempted to execute a test run from my terminal as follows:

$ ./interproscan.sh -f TSV -i ~/Documents/interproscan-5.18-57.0/test_proteins_new.fasta -b ~/Documents/interproscan-5.18-57.0/output_test.tsv

Unfortunately I encounter 'fatal' errors when executing the line above. I've attached the full output log file to this message.

log.txt

Can someone kindly assist me in rectifying this issue?

Thank you, Lili

mifraser commented 8 years ago

Hi Lili,

Thank you for your interest in InterProScan! This software does not support Mac OS X and is written for a Linux environment, from https://github.com/ebi-pf-team/interproscan/wiki/InstallationRequirements ...

"InterProScan is developed to run on Linux. There are no versions planned for Windows or Apple (MAC OS X) operating systems. This is due to constraints in the various third-party binaries that InterProScan runs."

Kind regards,

Matt

lilipeng commented 8 years ago

Hi Matt,

I'm actually running it on my Linux terminal on my Macbook and still get the error.

Lili

mifraser commented 8 years ago

Hi Lili,

The binaries that we distribute with InterProScan should work on most linux systems. However, in some cases they may not work on a particular system. If this happens you need to compile the binary on your own system in order for it to work.

From the errors it looks like you need to compile your own hmmer3 and ncoils binaries and ensure InterProScan uses those instead of the supplied ones.

Instructions for that are given here:

https://github.com/ebi-pf-team/interproscan/wiki/CompilingBinaries

Hope that helps!

Matt

gsn7 commented 8 years ago

hi Lili, you have too many errors which suggests something wrong with your environment. what is the output of the following command uname -a

lilipeng commented 8 years ago

Hi,

It is as follows:

$ uname -a Darwin local 12.6.0 Darwin Kernel Version 12.6.0: Wed Mar 18 16:23:48 PDT 2015; root:xnu-2050.48.19~1/RELEASE_X86_64 x86_64

Thanks, Lili

On 6 June 2016 at 05:02, gsn7 notifications@github.com wrote:

hi Lili, you have too many errors which suggests something wrong with your environment. what is the output of the following command uname -a

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ebi-pf-team/interproscan/issues/6#issuecomment-223904528, or mute the thread https://github.com/notifications/unsubscribe/AGPLn1q4yw9EpQjafbgDHzEs57w3en2Fks5qI-IogaJpZM4IqjMG .

gsn7 commented 8 years ago

Hi Lili, Thanks for this info. It confirms your system is an OS X machine. thats why you are getting all those errors. As a colleague in a previous message mentioned InterProScan works and has only been tested on Linux. OS X does have a unix terminal, but it is not giving you access to a Linux environment.

Your organisation might have a Linux server/cluster and you should install InterProScan there if you have access. Else, you can also use the Interproscan webservices documented here http://www.ebi.ac.uk/Tools/webservices/services/pfa/iprscan5_rest Let us know if you require more help Gift

lilipeng commented 8 years ago

Hi,

Yes, I've had the system admin in my organization install it on our Linux cluster. He says that "there isn't much documentation" on installing Interproscan..

Here is the command I'd run:

~$ ~/local/interproscan/interproscan-5.18-57.0/interproscan.sh -f TSV ~/local/interproscan/interproscan-5.18-57.0/test_proteins_new.fasta -b output_test.tsv

However, I did not get an output file, only the following message (beware it's a bit long):

06/06/2016 15:27:47:414 Welcome to InterProScan-5.18-57.0 usage: java -XX:+UseParallelGC -XX:ParallelGCThreads=2 -XX:+AggressiveOpts -XX:+UseFastAccessorMethods -Xms128M -Xmx2048M -jar interproscan-5.jar

Please give us your feedback by sending an email to

interhelp@ebi.ac.uk

-appl,--applications Optional, comma separated list of analyses. If this option is not set, ALL analyses will be run. -b,--output-file-base Optional, base output filename (relative or absolute path). Note that this option, the --output-dir (-d) option and the --outfile (-o) option are mutually exclusive. The appropriate file extension for the output format(s) will be appended automatically. By default the input file path/name will be used. -d,--output-dir Optional, output directory. Note that this option, the --outfile (-o) option and the --output-file-base (-b) option are mutually exclusive. The output filename(s) are the same as the input filename, with the appropriate file extension(s) for the output format(s) appended automatically . -dp,--disable-precalc Optional. Disables use of the precalculated match lookup service. All match calculations will be run locally. -f,--formats Optional, case-insensitive, comma separated list of output formats. Supported formats are TSV, XML, GFF3, HTML and SVG. Default for protein sequences are TSV, XML and GFF3, or for nucleotide sequences GFF3 and XML. -goterms,--goterms Optional, switch on lookup of corresponding Gene Ontology annotation (IMPLIES -iprlookup option) -i,--input Optional, path to fasta file that should be loaded on Master startup. Alternatively, in CONVERT mode, the InterProScan 5 XML file to convert. -iprlookup,--iprlookup Also include lookup of corresponding InterPro annotation in the TSV and GFF3 output formats. -ms,--minsize Optional, minimum nucleotide size of ORF to report. Will only be considered if n is specified as a sequence type. Please be aware of the fact that if you specify a too short value it might be that the analysis takes a very long time! -o,--outfile Optional explicit output file name (relative or absolute path). Note that this option, the --output-dir (-d) option and the --output-file-base (-b) option are mutually exclusive. If this option is given, you MUST specify a single output format using the -f option. The output file name will not be modified. Note that specifying an output file name using this option OVERWRITES ANY EXISTING FILE. -pa,--pathways Optional, switch on lookup of corresponding Pathway annotation (IMPLIES -iprlookup option) -t,--seqtype Optional, the type of the input sequences (dna/rna (n) or protein (p)). The default sequence type is protein. -T,--tempdir Optional, specify temporary file directory (relative or absolute path). The default location is temp/. Copyright © EMBL European Bioinformatics Institute, Hinxton, Cambridge, UK. (http://www.ebi.ac.uk) The InterProScan software itself is provided under the Apache License, Version 2.0 (http://www.apache.org/licenses/LICENSE-2.0.html). Third party components (e.g. member database binaries and models) are subject to separate licensing - please see the individual member database websites for details.

Available analyses: ProSitePatterns (20.119) : PROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them SUPERFAMILY (1.75) : SUPERFAMILY is a database of structural and functional annotation for all proteins and genomes. Gene3D (3.5.0) : Structural assignment for whole genes and genomes using the CATH domain structure database Hamap (201511.02) : High-quality Automated and Manual Annotation of Microbial Proteomes Pfam (29.0) : A large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs) Coils (2.2.1) : Prediction of Coiled Coil Regions in Proteins ProSiteProfiles (20.119) : PROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them ProDom (2006.1) : ProDom is a comprehensive set of protein domain families automatically generated from the UniProt Knowledge Database. TIGRFAM (15.0) : TIGRFAMs are protein families based on Hidden Markov Models or HMMs SMART (7.1) : SMART allows the identification and analysis of domain architectures based on Hidden Markov Models or HMMs PRINTS (42.0) : A fingerprint is a group of conserved motifs used to characterise a protein family PIRSF (3.01) : The PIRSF concept is being used as a guiding principle to provide comprehensive and non-overlapping clustering of UniProtKB sequences into a hierarchical order to reflect their evolutionary relationships.

Deactivated analyses: Phobius (1.01) : Analysis Phobius is deactivated, because the resources expected at the following paths do not exist: /opt/az/local/interproscan/interproscan-5.18-57.0/bin/phobius/1.01/ phobius.pl SignalP_GRAM_POSITIVE (4.1) : Analysis SignalP_GRAM_POSITIVE is deactivated, because the resources expected at the following paths do not exist: /opt/az/local/interproscan/interproscan-5.18-57.0/bin/signalp/4.1/signalp TMHMM (2.0c) : Analysis TMHMM is deactivated, because the resources expected at the following paths do not exist: /opt/az/local/interproscan/interproscan-5.18-57.0/bin/tmhmm/2.0c/decodeanhmm, /opt/az/local/interproscan/interproscan-5.18-57.0/data/tmhmm/2.0c/TMHMM2.0c.model PANTHER (10.0) : Analysis Panther is deactivated, because the resources expected at the following paths do not exist: /opt/az/local/interproscan/interproscan-5.18-57.0/data/panther/10.0/model SignalP_EUK (4.1) : Analysis SignalP_EUK is deactivated, because the resources expected at the following paths do not exist: /opt/az/local/interproscan/interproscan-5.18-57.0/bin/signalp/4.1/signalp SignalP_GRAM_NEGATIVE (4.1) : Analysis SignalP_GRAM_NEGATIVE is deactivated, because the resources expected at the following paths do not exist: /opt/az/local/interproscan/interproscan-5.18-57.0/bin/signalp/4.1/signalp

On 6 June 2016 at 11:13, gsn7 notifications@github.com wrote:

Hi Lili, Thanks for this info. It confirms your system is an OS X machine. thats why you are getting all those errors. As a colleague in a previous message mentioned InterProScan works and has only been tested on Linux. OS X does have a unix terminal, but it is not giving you access to a Linux environment.

Your organisation might have a Linux server/cluster and you should install InterProScan there if you have access. Else, you can also use the Interproscan webservices documented here http://www.ebi.ac.uk/Tools/webservices/services/pfa/iprscan5_rest Let us know if you require more help Gift

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ebi-pf-team/interproscan/issues/6#issuecomment-223989402, or mute the thread https://github.com/notifications/unsubscribe/AGPLn4qmbiAQvcmXpsILDb9OfxYzIeHTks5qJDkJgaJpZM4IqjMG .

lilipeng commented 8 years ago

Actually the system admin and I were able to figure out the issue. Thanks for all your help.

On 6 June 2016 at 15:51, Lili Peng lilipeng@gmail.com wrote:

Hi,

Yes, I've had the system admin in my organization install it on our Linux cluster. He says that "there isn't much documentation" on installing Interproscan..

Here is the command I'd run:

~$ ~/local/interproscan/interproscan-5.18-57.0/interproscan.sh -f TSV ~/local/interproscan/interproscan-5.18-57.0/test_proteins_new.fasta -b output_test.tsv

However, I did not get an output file, only the following message (beware it's a bit long):

06/06/2016 15:27:47:414 Welcome to InterProScan-5.18-57.0 usage: java -XX:+UseParallelGC -XX:ParallelGCThreads=2 -XX:+AggressiveOpts -XX:+UseFastAccessorMethods -Xms128M -Xmx2048M -jar interproscan-5.jar

Please give us your feedback by sending an email to

interhelp@ebi.ac.uk

-appl,--applications Optional, comma separated list of analyses. If this option is not set, ALL analyses will be run. -b,--output-file-base Optional, base output filename (relative or absolute path). Note that this option, the --output-dir (-d) option and the --outfile (-o) option are mutually exclusive. The appropriate file extension for the output format(s) will be appended automatically. By default the input file path/name will be used. -d,--output-dir Optional, output directory. Note that this option, the --outfile (-o) option and the --output-file-base (-b) option are mutually exclusive. The output filename(s) are the same as the input filename, with the appropriate file extension(s) for the output format(s) appended automatically . -dp,--disable-precalc Optional. Disables use of the precalculated match lookup service. All match calculations will be run locally. -f,--formats Optional, case-insensitive, comma separated list of output formats. Supported formats are TSV, XML, GFF3, HTML and SVG. Default for protein sequences are TSV, XML and GFF3, or for nucleotide sequences GFF3 and XML. -goterms,--goterms Optional, switch on lookup of corresponding Gene Ontology annotation (IMPLIES -iprlookup option) -i,--input Optional, path to fasta file that should be loaded on Master startup. Alternatively, in CONVERT mode, the InterProScan 5 XML file to convert. -iprlookup,--iprlookup Also include lookup of corresponding InterPro annotation in the TSV and GFF3 output formats. -ms,--minsize Optional, minimum nucleotide size of ORF to report. Will only be considered if n is specified as a sequence type. Please be aware of the fact that if you specify a too short value it might be that the analysis takes a very long time! -o,--outfile Optional explicit output file name (relative or absolute path). Note that this option, the --output-dir (-d) option and the --output-file-base (-b) option are mutually exclusive. If this option is given, you MUST specify a single output format using the -f option. The output file name will not be modified. Note that specifying an output file name using this option OVERWRITES ANY EXISTING FILE. -pa,--pathways Optional, switch on lookup of corresponding Pathway annotation (IMPLIES -iprlookup option) -t,--seqtype Optional, the type of the input sequences (dna/rna (n) or protein (p)). The default sequence type is protein. -T,--tempdir Optional, specify temporary file directory (relative or absolute path). The default location is temp/. Copyright © EMBL European Bioinformatics Institute, Hinxton, Cambridge, UK. (http://www.ebi.ac.uk) The InterProScan software itself is provided under the Apache License, Version 2.0 (http://www.apache.org/licenses/LICENSE-2.0.html). Third party components (e.g. member database binaries and models) are subject to separate licensing - please see the individual member database websites for details.

Available analyses: ProSitePatterns (20.119) : PROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them SUPERFAMILY (1.75) : SUPERFAMILY is a database of structural and functional annotation for all proteins and genomes. Gene3D (3.5.0) : Structural assignment for whole genes and genomes using the CATH domain structure database Hamap (201511.02) : High-quality Automated and Manual Annotation of Microbial Proteomes Pfam (29.0) : A large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs) Coils (2.2.1) : Prediction of Coiled Coil Regions in Proteins ProSiteProfiles (20.119) : PROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them ProDom (2006.1) : ProDom is a comprehensive set of protein domain families automatically generated from the UniProt Knowledge Database. TIGRFAM (15.0) : TIGRFAMs are protein families based on Hidden Markov Models or HMMs SMART (7.1) : SMART allows the identification and analysis of domain architectures based on Hidden Markov Models or HMMs PRINTS (42.0) : A fingerprint is a group of conserved motifs used to characterise a protein family PIRSF (3.01) : The PIRSF concept is being used as a guiding principle to provide comprehensive and non-overlapping clustering of UniProtKB sequences into a hierarchical order to reflect their evolutionary relationships.

Deactivated analyses: Phobius (1.01) : Analysis Phobius is deactivated, because the resources expected at the following paths do not exist: /opt/az/local/interproscan/interproscan-5.18-57.0/bin/phobius/1.01/ phobius.pl SignalP_GRAM_POSITIVE (4.1) : Analysis SignalP_GRAM_POSITIVE is deactivated, because the resources expected at the following paths do not exist: /opt/az/local/interproscan/interproscan-5.18-57.0/bin/signalp/4.1/signalp TMHMM (2.0c) : Analysis TMHMM is deactivated, because the resources expected at the following paths do not exist: /opt/az/local/interproscan/interproscan-5.18-57.0/bin/tmhmm/2.0c/decodeanhmm, /opt/az/local/interproscan/interproscan-5.18-57.0/data/tmhmm/2.0c/TMHMM2.0c.model PANTHER (10.0) : Analysis Panther is deactivated, because the resources expected at the following paths do not exist: /opt/az/local/interproscan/interproscan-5.18-57.0/data/panther/10.0/model SignalP_EUK (4.1) : Analysis SignalP_EUK is deactivated, because the resources expected at the following paths do not exist: /opt/az/local/interproscan/interproscan-5.18-57.0/bin/signalp/4.1/signalp SignalP_GRAM_NEGATIVE (4.1) : Analysis SignalP_GRAM_NEGATIVE is deactivated, because the resources expected at the following paths do not exist: /opt/az/local/interproscan/interproscan-5.18-57.0/bin/signalp/4.1/signalp

On 6 June 2016 at 11:13, gsn7 notifications@github.com wrote:

Hi Lili, Thanks for this info. It confirms your system is an OS X machine. thats why you are getting all those errors. As a colleague in a previous message mentioned InterProScan works and has only been tested on Linux. OS X does have a unix terminal, but it is not giving you access to a Linux environment.

Your organisation might have a Linux server/cluster and you should install InterProScan there if you have access. Else, you can also use the Interproscan webservices documented here http://www.ebi.ac.uk/Tools/webservices/services/pfa/iprscan5_rest Let us know if you require more help Gift

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ebi-pf-team/interproscan/issues/6#issuecomment-223989402, or mute the thread https://github.com/notifications/unsubscribe/AGPLn4qmbiAQvcmXpsILDb9OfxYzIeHTks5qJDkJgaJpZM4IqjMG .