ebi-pf-team / interproscan

Genome-scale protein function classification
Apache License 2.0
290 stars 67 forks source link

Error running test file #173

Closed ShailNair closed 3 years ago

ShailNair commented 3 years ago

HI, Unfortunately, i received an error while running the test file.

Linux info.- Linux mcs1 3.10.0-1127.18.2.el7.x86_64 #1 SMP Sun Jul 26 15:27:06 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

python3 --version Python 3.7.8 java -version openjdk version "11.0.8-internal" 2020-07-14

I have downloaded and unzipped InterProScan 5.47-82.0 ran the initial setup and when I run the test file I get an error saying

ERROR - Command line failed with exit code: 1 Command: bin/hmmer/hmmer3/3.3/hmmscan -E 0.01 --acc --cpu 10 -o /home/mcs/soft/interproscan-5.47-82.0/temp/mcs1_20201125_091103548_klwo//jobPIRSF/000000000001_000000000006.raw.out --domtblout /home/mcs/soft/interproscan-5.47-82.0/temp/mcs1_20201125_091103548_klwo//jobPIRSF/000000000001_000000000006.raw.domtblout.out data/pirsf/3.10/sf_hmm_all /home/mcs/soft/interproscan-5.47-82.0/temp/mcs1_20201125_091103548_klwo//jobPIRSF/000000000001_000000000006.fasta Error output from binary:

Error: File format problem, trying to open HMM file data/pirsf/3.10/sf_hmm_all. Opened data/pirsf/3.10/sf_hmm_all.h3m, a pressed HMM file; but format of its .h3i file unrecognized

Attached -terminal output-

terminal.txt

gsn7 commented 3 years ago

The error you get is usually observed when you are running InterProScan v5.47-82.0 for the first time. Its likely if you try running it again you will now run it successfully. If you are still having problems, then I would suggest you run the following command from the InterProScan installation directory: python3 initial_setup.py Afterwards, InterProScan should run as expected.

ShailNair commented 3 years ago

HI, I tried 3-4 times with the test data and twice with my data. i am getting the same error. The python3 initial_setup.py command does not return anything ( not even any error or success message).

gsn7 commented 3 years ago

i am surprised you are still getting the errors after running _python3 initialsetup.py. The _python3 initial_setup.py_doesnt return any message if successful. what is the output of the command? ls -l data/pirsf/3.10/sf_hmm_all*

ShailNair commented 3 years ago

ls -l data/pirsf/3.10/sf_hmm_all* -rw-rw-r-- 1 mcs mcs 608224141 10月 8 17:39 data/pirsf/3.10/sf_hmm_all -rw-rw-r-- 1 mcs mcs 1032192 11月 25 08:16 data/pirsf/3.10/sf_hmm_all.h3f -rw-rw-r-- 1 mcs mcs 0 11月 25 08:16 data/pirsf/3.10/sf_hmm_all.h3i -rw-rw-r-- 1 mcs mcs 2244608 11月 25 08:16 data/pirsf/3.10/sf_hmm_all.h3m -rw-rw-r-- 1 mcs mcs 2633728 11月 25 08:16 data/pirsf/3.10/sf_hmm_all.h3p

Ignore the 月 symbol. Its for month in chinese .

ShailNair commented 3 years ago

When i run on my samples. I GET SIMILIAR ERROR. Here is the run output

for SET in cat list.txt

do /home/mcs/soft/interproscan-5.47-82.0/interproscan.sh -i /home/mcs/gene/shail/metagenomics/CLEANED_FILES/sunbeam_decontaminated/leptolyngbya_cleaned/sunbeam_output/qc/squeezemeta_all_contig/protein_seq/$SET-protein-sequences.fa \ -f tsv \ -o /home/mcs/gene/shail/metagenomics/CLEANED_FILES/sunbeam_decontaminated/leptolyngbya_cleaned/sunbeam_output/qc/squeezemeta_all_contig/protein_seq/$SET-interpro-output.tsv -cpu 80 \ -iprlookup \ -goterms \ -pa done

25/11/2020 20:02:49:958 Welcome to InterProScan-5.47-82.0 25/11/2020 20:02:49:960 Running InterProScan v5 in STANDALONE mode... on Linux 25/11/2020 20:03:03:751 Loading file /home/mcs/gene/shail/metagenomics/CLEANED_FILES/sunbeam_decontaminated/leptolyngbya_cleaned/sunbeam_output/qc/squeezemeta_all_contig/protein_seq/01.100_days-protein-sequences.fa 25/11/2020 20:03:03:753 Running the following analyses: [CDD-3.17,Coils-2.2.1,Gene3D-4.2.0,Hamap-2020_01,MobiDBLite-2.0,PANTHER-15.0,Pfam-33.1,PIRSF-3.10,PRINTS-42.0,ProSitePatterns-2019_11,ProSiteProfiles-2019_11,SFLD-4,SMART-7.1,SUPERFAMILY-1.75,TIGRFAM-15.0] Available matches will be retrieved from the pre-calculated match lookup service.

Matches for any sequences that are not represented in the lookup service will be calculated locally. 25/11/2020 20:03:32:040 Uploaded 112359 unique sequences for analysis 25/11/2020 20:19:36:404 37% completed 25/11/2020 20:19:51:846 62% completed 25/11/2020 20:19:53:266 87% completed 2020-11-25 20:21:44,204 [amqEmbeddedWorkerJmsContainer-5] [uk.ac.ebi.interpro.scan.management.model.implementations.RunBinaryStep:199] ERROR - Command line failed with exit code: 1 Command: bin/hmmer/hmmer3/3.1b1/hmmsearch -Z 65000000 -E 0.001 --domE 0.00000001 --incdomE 0.00000001 --cpu 10 -o /home/mcs/gene/shail/metagenomics/CLEANED_FILES/sunbeam_decontaminated/leptolyngbya_cleaned/sunbeam_output/qc/squeezemeta_all_contig/fixeed_contig/temp/mcs1_20201125_200254341_1sp7//jobPanther/000000102001_000000102600.raw.out --domtblout /home/mcs/gene/shail/metagenomics/CLEANED_FILES/sunbeam_decontaminated/leptolyngbya_cleaned/sunbeam_output/qc/squeezemeta_all_contig/fixeed_contig/temp/mcs1_20201125_200254341_1sp7//jobPanther/000000102001_000000102600.raw.domtblout.out data/panther/15.0/panther.hmm /home/mcs/gene/shail/metagenomics/CLEANED_FILES/sunbeam_decontaminated/leptolyngbya_cleaned/sunbeam_output/qc/squeezemeta_all_contig/fixeed_contig/temp/mcs1_20201125_200254341_1sp7//jobPanther/000000102001_000000102600.fasta Error output from binary: Error: File format problem in trying to open HMM file data/panther/15.0/panther.hmm. Opened data/panther/15.0/panther.hmm.h3m, a pressed HMM file; but format of its .h3i file unrecognized

gsn7 commented 3 years ago

there is a problem with the indices in you data as i can see here: -rw-rw-r-- 1 mcs mcs 0 11月 25 08:16 data/pirsf/3.10/sf_hmm_all.h3i

you might have to investigate why you are unable to generate the data indices file: try regenerating the indices in the data for one database like PIRSF by running bin/hmmer/hmmer3/3.3/hmmpress -f data/pirsf/3.10/sf_hmm_all

then let me know what you get when you run the command? ls -l data/pirsf/3.10/sf_hmm_all*

ShailNair commented 3 years ago

Here is the output

base) [mcs@mcs1 interproscan-5.47-82.0]$ bin/hmmer/hmmer3/3.3/hmmpress -f data/pirsf/3.10/sf_hmm_all Working... done. Pressed and indexed 3283 HMMs (3283 names and 3283 accessions). Models pressed into binary file: data/pirsf/3.10/sf_hmm_all.h3m SSI index for binary model file: data/pirsf/3.10/sf_hmm_all.h3i Profiles (MSV part) pressed into: data/pirsf/3.10/sf_hmm_all.h3f Profiles (remainder) pressed into: data/pirsf/3.10/sf_hmm_all.h3p (base) [mcs@mcs1 interproscan-5.47-82.0]$ ls -l data/pirsf/3.10/sf_hmm_all* -rw-rw-r-- 1 mcs mcs 608224141 10月 8 17:39 data/pirsf/3.10/sf_hmm_all -rw-rw-r-- 1 mcs mcs 103345491 11月 25 22:11 data/pirsf/3.10/sf_hmm_all.h3f -rw-rw-r-- 1 mcs mcs 328421 11月 25 22:11 data/pirsf/3.10/sf_hmm_all.h3i -rw-rw-r-- 1 mcs mcs 252147360 11月 25 22:11 data/pirsf/3.10/sf_hmm_all.h3m -rw-rw-r-- 1 mcs mcs 296057244 11月 25 22:11 data/pirsf/3.10/sf_hmm_all.h3p

gsn7 commented 3 years ago

the data looks OK now for PIRSF, but you need to do the same for the other hmm based analyses. I will give you a list of commands

ShailNair commented 3 years ago

OK. Thanks for the help

gsn7 commented 3 years ago

the following commands should do:

bin/hmmer/hmmer3/3.3/hmmpress -f data/gene3d/4.2.0/gene3d_main.hmm
bin/hmmer/hmmer3/3.3/hmmpress -f data/hamap/2020_01/hamap.hmm.lib
bin/hmmer/hmmer3/3.3/hmmpress -f data/panther/15.0/panther.hmm
bin/hmmer/hmmer3/3.3/hmmpress -f data/pfam/33.1/pfam_a.hmm
bin/hmmer/hmmer3/3.1b1/hmmpress -f data/sfld/4/sfld.hmm
bin/hmmer/hmmer3/3.1b1/hmmpress -f data/superfamily/1.75/hmmlib_1.75
bin/hmmer/hmmer3/3.3/hmmpress -f data/tigrfam/15.0/TIGRFAMs_HMM.LIB
gsn7 commented 3 years ago

i also notice in the error message you have changed the cpu option for hmmer as in bin/hmmer/hmmer3/3.3/hmmscan -E 0.01 --acc --cpu 10 assigning 10 cpus to one hmmer job may not improve performance, especially that you are running all the analyses in InterProSCan. how many cores does you machine have?

ShailNair commented 3 years ago

Yes. in the InterProscan.properties file I have changed CPU value to 10 for each job. I work on a server with 104 cores. After running the data indices command provided by you, I ran the test file and it was successfully executed. After that, I ran InterProscan on my samples , and the process is in progress without any error (I am running on batch samples). As you said, increasing the number of CPU's didn't improve performance. Is there a better way to improve performance without getting into errors?

gsn7 commented 3 years ago

this page describes some of the tips to improving performance. https://interproscan-docs.readthedocs.io/en/latest/ImprovingPerformance.html

ShailNair commented 3 years ago

Hi, thanks...i will try it for next run. the previous was successfully executed (though took almost 2 days).

clwang4802 commented 3 years ago

I have the similar error message.

Then when I ran the commond bin/hmmer/hmmer3/3.3/hmmpress -f data/hamap/2020_01/hamap.hmm.lib

it said Error: File existence/permissions problem in trying to open HMM file data/hamap/2020_01/hamap.hmm.lib

gsn7 commented 3 years ago

@clwang4802 what do you get when you run the command: ls -l data/hamap/2020_01/hamap.hmm.lib*

clwang4802 commented 3 years ago

@clwang4802 what do you get when you run the command: ls -l data/hamap/2020_01/hamap.hmm.lib*

@gsn7 It seems the hmm_all files rebuilt jobs were not able to complete in a "dirty" directory.

I just avoid the error by

  1. deleting my previous iprscan directory
  2. download & untar a new copy of interproscan*.tar.gz.
  3. run the "initial_setup.py" inside the new directory.
clwang4802 commented 3 years ago

@clwang4802 what do you get when you run the command: ls -l data/hamap/2020_01/hamap.hmm.lib* @gsn7 Before I removed the failed iprscan folder, I noticed that there is no "hamap/2020_01" folder but "2020_05"

webbchen commented 3 years ago

I got rid of the error without deleting and reinstalling everything: a) Rename the directory hamap/2020_05 to hamap/2020_01 b) run these commands as suggested by @gsn7 up in the thread:

the following commands should do:

bin/hmmer/hmmer3/3.3/hmmpress -f data/gene3d/4.2.0/gene3d_main.hmm
bin/hmmer/hmmer3/3.3/hmmpress -f data/hamap/2020_01/hamap.hmm.lib
bin/hmmer/hmmer3/3.3/hmmpress -f data/panther/15.0/panther.hmm
bin/hmmer/hmmer3/3.3/hmmpress -f data/pfam/33.1/pfam_a.hmm
bin/hmmer/hmmer3/3.1b1/hmmpress -f data/sfld/4/sfld.hmm
bin/hmmer/hmmer3/3.1b1/hmmpress -f data/superfamily/1.75/hmmlib_1.75
bin/hmmer/hmmer3/3.3/hmmpress -f data/tigrfam/15.0/TIGRFAMs_HMM.LIB

c) rename the directory back to its old name.

I suspect adding a soft-link named "2020_01" pointing to the offending directory would have probably saved the trouble of renaming the directory twice.

After doing this both test runs completed without complaint. Good luck!

jacek-kominek commented 3 years ago

I know the issue has been solved but I wanted to chime in since I had the same issue with Interproscan 5.52-86 recently. The problem was the that "initial_setup.py" did run without any issues, but the h3i index file for superfamily was a 0-byte file, even through the other files were generated just fine. After some troubleshooting, it turned out that running the script as "./initial_setup.py" rather than "python3 ./initial_setup.py" was loading Python2 instead of Python3, which resulted in the failed index. Re-running it under explicit Python3 solved the issue. I know that the Python3 call is clearly stated in the docs, but since this is such a sneaky issue (the script will run perfectly fine under Python2) it might be helpful to perhaps explicitly warn NOT TO use Python2 here, or include a check in the script itself to verify the version of Python that is running and exit if it's Python2? Just a thought.

DrNavi commented 1 year ago

Hi, I have downloaded the interproscan "wget https://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/5.60-92.0/interproscan-5.60-92.0-64-bit.tar.gz". I did not get any error in installation but after installation but when i run initial_setup.py command, I encountered this result,

python3 initial_setup.py python3: can't open file 'initial_setup.py': [Errno 2] No such file or directory

In addition to this when I run the test file using command, ./interproscan.sh -i test_all_appl.fasta -f tsv -dp I got the output .tsv file but during the analysis, I got many warning messages like this:

2023-02-01 18:00:49,435 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:129] WARN - Couldn't parse the given raw match line, because it is of an unexpected format. 2023-02-01 18:00:49,438 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:130] WARN - Unexpected Raw match line: Query sequence: 1 matches PIRSF001789: Nerve growth factor, subunit beta 2023-02-01 18:00:49,438 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:129] WARN - Couldn't parse the given raw match line, because it is of an unexpected format. 2023-02-01 18:00:49,438 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:130] WARN - Unexpected Raw match line: # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali to envfrom env to acc 2023-02-01 18:00:49,438 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:129] WARN - Couldn't parse the given raw match line, because it is of an unexpected format. 2023-02-01 18:00:49,438 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:130] WARN - Unexpected Raw match line: 1 ! 339.3 1.4 1.1e-105 3.5e-102 1 252 [. 1 256 [. 1 257 [] 0.97 2023-02-01 18:00:49,438 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:129] WARN - Couldn't parse the given raw match line, because it is of an unexpected format. 2023-02-01 18:00:49,438 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:130] WARN - Unexpected Raw match line: Query sequence: 3 matches PIRSF001220: L-asparaginase/Glutamyl-tRNA(Gln) amidotransferase subunit D 2023-02-01 18:00:49,439 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:129] WARN - Couldn't parse the given raw match line, because it is of an unexpected format. 2023-02-01 18:00:49,439 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:130] WARN - Unexpected Raw match line: # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali to envfrom env to acc 2023-02-01 18:00:49,439 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:129] WARN - Couldn't parse the given raw match line, because it is of an unexpected format. 2023-02-01 18:00:49,439 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:130] WARN - Unexpected Raw match line: 1 ! 296.3 3.4 4.8e-92 5.3e-89 3 323 .. 48 365 .. 46 370 .] 0.96 2023-02-01 18:00:49,439 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:129] WARN - Couldn't parse the given raw match line, because it is of an unexpected format. 2023-02-01 18:00:49,439 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:130] WARN - Unexpected Raw match line: and matches Sub-Family PIRSF500176: L-asparaginase/L-glutaminase 2023-02-01 18:00:49,439 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:129] WARN - Couldn't parse the given raw match line, because it is of an unexpected format. 2023-02-01 18:00:49,439 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:130] WARN - Unexpected Raw match line: # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali to envfrom env to acc 2023-02-01 18:00:49,439 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:129] WARN - Couldn't parse the given raw match line, because it is of an unexpected format. 2023-02-01 18:00:49,439 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:130] WARN - Unexpected Raw match line: 1 ! 252.9 3.1 8.3e-79 9.1e-76 3 324 .. 50 367 .. 48 370 .] 0.91 01/02/2023 18:00:53:550 50% completed 01/02/2023 18:01:06:867 77% completed 01/02/2023 18:01:26:519 90% completed 01/02/2023 18:01:59:389 100% done: InterProScan analyses completed

I just wanted to know is my installation is successful and can I use interproscan for my own data??? and why I did not find initial_setup.py script in my installed version?

tgrego commented 1 year ago

Hello, initial_setup.py has been deprecated, you can now run python3 setup.py interproscan.properties instead.

About the error you're experiencing, please add the following line to your interproscan.properties file: pirsf.pl.binary.switches=--outfmt i5

That should fix it and your installation should be good to run.

DrNavi commented 1 year ago

Dear Tgrego, Thank You very much for the guidance. I have followed your advice and I did not find any warning message time. Once again thank you for your kind help.

02/02/2023 18:37:23:788 Welcome to InterProScan-5.60-92.0 02/02/2023 18:37:23:789 Running InterProScan v5 in STANDALONE mode... on Linux 02/02/2023 18:37:27:368 RunID: navi_20230202_183727283_isqv 02/02/2023 18:37:33:646 Loading file /home/navi/software/my_interproscan/interproscan-5.60-92.0/test_all_appl.fasta 02/02/2023 18:37:33:647 Running the following analyses: [AntiFam-7.0,CDD-3.20,Coils-2.2.1,FunFam-4.3.0,Gene3D-4.3.0,Hamap-2021_04,MobiDBLite-2.0,PANTHER-17.0,Pfam-35.0,PIRSF-3.10,PIRSR-2021_05,PRINTS-42.0,ProSitePatterns-2022_01,ProSiteProfiles-2022_01,SFLD-4,SMART-7.1,SUPERFAMILY-1.75,TIGRFAM-15.0] Pre-calculated match lookup service DISABLED. Please wait for match calculations to complete... 02/02/2023 18:37:46:593 25% completed 02/02/2023 18:37:59:058 51% completed 02/02/2023 18:38:15:895 75% completed 02/02/2023 18:38:29:311 90% completed 02/02/2023 18:38:45:072 100% done: InterProScan analyses completed