Closed marketavlkova closed 6 months ago
A possibly similar error is appearing in the test-set run, it seems to dislike the raw PIRSF-output.
./interproscan.sh -i test_all_appl.fasta -f tsv -dp
15/02/2023 12:19:48:774 Welcome to InterProScan-5.60-92.0
15/02/2023 12:19:48:777 Running InterProScan v5 in STANDALONE mode... on Linux
15/02/2023 12:19:56:983 RunID: n19-32-192-crossbones.hpc.hutton.ac.uk_20230215_121956651_5puw
15/02/2023 12:20:09:936 Loading file /mnt/shared/scratch/awebb/apps/interproscan/interproscan-5.60-92.0/test_all_appl.fasta
15/02/2023 12:20:09:937 Running the following analyses:
[AntiFam-7.0,CDD-3.20,Coils-2.2.1,FunFam-4.3.0,Gene3D-4.3.0,Hamap-2021_04,MobiDBLite-2.0,PANTHER-17.0,Pfam-35.0,PIRSF-3.10,PIRSR-2021_05,PRINTS-42.0,ProSitePatterns-2022_01,ProSiteProfiles-2022_01,SFLD-4,SMART-7.1,SUPERFAMILY-1.75,TIGRFAM-15.0]
Pre-calculated match lookup service DISABLED. Please wait for match calculations to complete...
15/02/2023 12:20:30:614 25% completed
2023-02-15 12:20:39,478 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:129] WARN - Couldn't parse the given raw match line, because it is of an unexpected format.
2023-02-15 12:20:39,492 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:130] WARN - Unexpected Raw match line: Query sequence: 1 matches PIRSF001789: Nerve growth factor, subunit beta
2023-02-15 12:20:39,492 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:129] WARN - Couldn't parse the given raw match line, because it is of an unexpected format.
2023-02-15 12:20:39,492 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:130] WARN - Unexpected Raw match line: # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali to envfrom env to acc
2023-02-15 12:20:39,492 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:129] WARN - Couldn't parse the given raw match line, because it is of an unexpected format.
2023-02-15 12:20:39,492 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:130] WARN - Unexpected Raw match line: 1 ! 339.3 1.4 1.1e-105 3.5e-102 1 252 [. 1 256 [. 1 257 [] 0.97
2023-02-15 12:20:39,492 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:129] WARN - Couldn't parse the given raw match line, because it is of an unexpected format.
2023-02-15 12:20:39,492 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:130] WARN - Unexpected Raw match line: Query sequence: 3 matches PIRSF001220: L-asparaginase/Glutamyl-tRNA(Gln) amidotransferase subunit D
2023-02-15 12:20:39,493 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:129] WARN - Couldn't parse the given raw match line, because it is of an unexpected format.
2023-02-15 12:20:39,493 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:130] WARN - Unexpected Raw match line: # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali to envfrom env to acc
2023-02-15 12:20:39,493 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:129] WARN - Couldn't parse the given raw match line, because it is of an unexpected format.
2023-02-15 12:20:39,493 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:130] WARN - Unexpected Raw match line: 1 ! 296.3 3.4 4.8e-92 5.3e-89 3 323 .. 48 365 .. 46 370 .] 0.96
2023-02-15 12:20:39,493 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:129] WARN - Couldn't parse the given raw match line, because it is of an unexpected format.
2023-02-15 12:20:39,493 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:130] WARN - Unexpected Raw match line: and matches Sub-Family PIRSF500176: L-asparaginase/L-glutaminase
2023-02-15 12:20:39,493 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:129] WARN - Couldn't parse the given raw match line, because it is of an unexpected format.
2023-02-15 12:20:39,493 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:130] WARN - Unexpected Raw match line: # score bias c-Evalue i-Evalue hmmfrom hmm to alifrom ali to envfrom env to acc
2023-02-15 12:20:39,493 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:129] WARN - Couldn't parse the given raw match line, because it is of an unexpected format.
2023-02-15 12:20:39,493 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:130] WARN - Unexpected Raw match line: 1 ! 252.9 3.1 8.3e-79 9.1e-76 3 324 .. 50 367 .. 48 370 .] 0.91
15/02/2023 12:20:47:882 50% completed
15/02/2023 12:21:21:467 75% completed
15/02/2023 12:21:59:690 90% completed
15/02/2023 12:22:58:830 100% done: InterProScan analyses completed
some Pirsf-lines do appear, though, in the test output, although I can't tell if that's what's supposed to be there and if it's all of it:
grep PIRSF test_all_appl.fasta_1.tsv
# yields:
UPI0004FABBC5 92e4b89dd86f8ab828f57121f6d7d460 257 PIRSF PIRSF001789 NGFB 1 257 3.5E-102 T 15-02-2023 IPR020408 Nerve growth factor-like
UPI0002E0D40B f91cb3cf61f2d7c7f5aaf6ea04e07868 370 PIRSF PIRSF001220 L-ASNase_gatD 46 370 5.3E-89 T 15-02-2023 IPR006034 Asparaginase/glutaminase-like
UPI0002E0D40B f91cb3cf61f2d7c7f5aaf6ea04e07868 370 PIRSF PIRSF500176 L_ASNase 48 370 9.1E-76 T 15-02-2023 - -
The test run without the -dp flag completed without any issues.
all installed and run on an HPC, OS: Rocky Linux 8.
Maybe try this solution posted at https://github.com/ebi-pf-team/interproscan/issues/173#issuecomment-1412069679. I had a similar warning with the test dataset, and I could run InterProScan on the test set without any warnings after following their solution.
The above mentioned solution worked for me after getting the same error as the poster.
Thank you!
I tried the solution as suggested by @Tsylvester8 (adding pirsf.pl.binary.switches=--outfmt i5
to interproscan.proporties and re-running python3 setup.py interproscan.properties
), but I'm still getting the same errors on my dataset and I didn't have any warnings/errors/problems running the test data to begin with.
I also created a new conda interproscan environment just in case the python3 setup.py interproscan.properties
command doesn't overwrite previous setup. The "Ignoring line with unexpected format" warnings persist.
I solved the problem by installing an older InterProScan version not using conda. I suspect the conda installation might be the issue here, but could be version specific too as I described here: https://github.com/slt666666/NLRtracker/issues/14#issuecomment-1443414467.
I have installed InterProScan version 5.59-91.0 using conda and run it on a cluster. The tool however skips analysis of some of the protein sequences reportedly due to an unexpected format:
When checking the output, some of the proteins are missing there, which are actually the ones I'm mostly interested in (predicted nucleotide-binding leucine-rich repeat proteins). Not sure, whether it is a bug or not, but I was unable to find out what the 'unexpected format' actually means and how to change it so that the tool doesn't exclude those sequences from the analysis.
For reproducibility of the warning messages, the protein sequences I'm using can be obtained here: https://kiwifruitgenome.org/ftp/A_chinensis/Hongyang/v3.0/Hongyang_pep_v3.0.fa.gz
PS: On test data everything works OK and I don't get any warning messages.