ebi-pf-team / interproscan

Genome-scale protein function classification
Apache License 2.0
290 stars 67 forks source link

WARN - Ignoring line with unexpected format #310

Closed marketavlkova closed 4 months ago

marketavlkova commented 1 year ago

I have installed InterProScan version 5.59-91.0 using conda and run it on a cluster. The tool however skips analysis of some of the protein sequences reportedly due to an unexpected format:

interproscan.sh -i ../../data/output/rgenes/NLRtracker/Acc-Hongyang_v3/tmp.fasta -f gff3 -t p -o ../../data/output/rgenes/NLRtracker/Acc-Hongyang_v3/interpro_result.gff -cpu 8 -appl Pfam,Gene3D,SUPERFAMILY,PRINTS,SMART,CDD,ProSiteProfiles
13/02/2023 17:16:11:317 Welcome to InterProScan-5.59-91.0
13/02/2023 17:16:11:318 Running InterProScan v5 in STANDALONE mode... on Linux
13/02/2023 17:16:17:037 RunID: node524_20230213_171616862_1ude
13/02/2023 17:16:27:877 Loading file /ebio/ag-mccann/projects/act_evo/code/rgenes/../../data/output/rgenes/NLRtracker/Acc-Hongyang_v3/tmp.fasta
13/02/2023 17:16:27:879 Running the following analyses:
[CDD-3.18,Gene3D-4.3.0,Pfam-35.0,PRINTS-42.0,ProSiteProfiles-2022_01,SMART-7.1,SUPERFAMILY-1.75]
Pre-calculated match lookup service DISABLED.  Please wait for match calculations to complete...
13/02/2023 17:16:36:174 Uploaded 40329 unique sequences for analysis
13/02/2023 17:19:23:821 25% completed
2023-02-13 17:20:31,154 [amqEmbeddedWorkerJmsContainer-7] [uk.ac.ebi.interpro.scan.io.superfamily.match.SuperFamilyHmmer3MatchParser:181] WARN - Ignoring line with unexpected format: 33483    0043429 34-229  1.20e-52    2SFLKTGTTIVGLVFQDGVILGADTRATEGPIVadkNCEKIHFMAPNIYCCGAGTAADTEAVTDMVSSQLKLHRYHTGRESRVVTALTLLKSHLFSYQGY-----VQAALVLGGVDVTGPHLHTIYPHGSTDTLPFATMGSGSLAAMAIFESKYRE----GLTRDEGVNLVTEAICSGIFNDLGSGSNVDVCVITKGNTEYLRNHK  2.18e-20    71351
2023-02-13 17:20:31,160 [amqEmbeddedWorkerJmsContainer-7] [uk.ac.ebi.interpro.scan.io.superfamily.match.SuperFamilyHmmer3MatchParser:181] WARN - Ignoring line with unexpected format: 33700    0043429 14-208  6.10e-49    3SSTSLGIKAANGVIIATEKKLPSILVDesSVQKIQILTPNIGVVYSGMGPDSRVLVRKSRKQAEQYHRLYKEPIPVTQLVRETAAVMQEFTQSGGVRPFGVSLLVAGFDGQGF----NNYTGSYFSWKASAMGKNVSNAKTFLEKRYTD----DMELDDAVHTAILTLKEGFEGQ-ISEKNIEIGIIGNDRKFRVLTPSEIADY   4.39e-20    71351
2023-02-13 17:20:31,223 [amqEmbeddedWorkerJmsContainer-7] [uk.ac.ebi.interpro.scan.io.superfamily.match.SuperFamilyHmmer3MatchParser:181] WARN - Ignoring line with unexpected format: 19058    0043429 38-179  2.30e-25    8ARQLIYQHQHNKQMSCPAMAQLLSNTLYYKRFF----PYYAFNVLGGLDNEGkGCVFTYDAVGSYERVGYSSQGSGSTLIMPFLDNQLKSPSPLllpaqdavtPLSEAEAIDLVKTCFASATERDIYTGDKLEIVVLNADDHPRTF 1.79e-20    71351
2023-02-13 17:20:31,238 [amqEmbeddedWorkerJmsContainer-7] [uk.ac.ebi.interpro.scan.io.superfamily.match.SuperFamilyHmmer3MatchParser:181] WARN - Ignoring line with unexpected format: 20551    0043429 2-203   2.00e-47    1FRNQYDTDVTTWSPQGRLFQVEYAMEAVKQGSAAIGLRSKTHAVLASVNKAQSELSSHQKKIFKVDHHIGVAIAGLTADGRVLSRYMRSECINYSYTYESPLPVGRLVVQLADKAQVCTQRSWKRQYGVAFAIGSARKP--------PKG------------------TYMERRFESFV--SSTREDLLKDALFALRETLQGEKLKSSICTVAVVGVGEEFHILDNETVQ 1.86e-19    71351
2023-02-13 17:20:31,238 [amqEmbeddedWorkerJmsContainer-7] [uk.ac.ebi.interpro.scan.io.superfamily.match.SuperFamilyHmmer3MatchParser:181] WARN - Ignoring line with unexpected format: 20627    0043429 2-215   2.90e-66    1FRNQYDTDVTTWSPQGRLFQVEYAMEAVKQGSAAIGLRSKTHVVLASVNKAQSELSSHQKKIFKVDHHIGVAIAGLTADGRVLSRYMRSECINYSYTYESPLPVGRLVVQLADKAQVCTQRSWKRPYGVGLLVAGLDESGAHLYYNCPSGNYFEYQAFAIGSRSQAAKTYMERRFEGFV--SSTREDLLKDALFALRETFREKSSRAQYAQFAVVG   3.35e-19    71351
2023-02-13 17:20:31,250 [amqEmbeddedWorkerJmsContainer-7] [uk.ac.ebi.interpro.scan.io.superfamily.match.SuperFamilyHmmer3MatchParser:181] WARN - Ignoring line with unexpected format: 39749    0043429 3-55    3.60e-15    4RYDRVITVFSPGGHFFQFEYALESVRKGNAIVGVRGTDAIVLDVEKKFASKLQ  2.76e-20    71351
2023-02-13 17:20:31,256 [amqEmbeddedWorkerJmsContainer-7] [uk.ac.ebi.interpro.scan.io.superfamily.match.SuperFamilyHmmer3MatchParser:181] WARN - Ignoring line with unexpected format: 37583    0043429 10-215  4.90e-45    2PYDNNGGTCVAIAGANYCVIAADTRMSTGYSIltrDYSKICKLADKSVMASSGFQADVRALQKVLAARHLIYQHQHNKQMSCPAMAQLLSNTLYYKRFF----PYYAFNVLGGLDNEGkGCVFTYDAVGSYERVGYSSQGSGSTLIMPFLDNQLKSPSPLllpakdavtPLSEPEAIDLVKTCFASATERDIYTGDKLEIVVLNAAGIRS 2.03e-20    71351
2023-02-13 17:20:31,256 [amqEmbeddedWorkerJmsContainer-7] [uk.ac.ebi.interpro.scan.io.superfamily.match.SuperFamilyHmmer3MatchParser:181] WARN - Ignoring line with unexpected format: 37724    0043429 129-318 3.00e-26    3TGTSVIGIKYKDGILMTADMGGSYGSTLrykSVERMKPVGKHSLLGASGEISDFQEILSYLDELILYDNMWDDgNSLGPKEVHNYLTRVMYNRRNK--FNPLWNSLVLGGIKNGQKYLGTVNMIGVHFEDNHVATGFGNHLARPILRDEWHE----NLSFEEGVKLLEKCMRVLLYRDRSAVNKLQIAKITEEGVT   1.78e-20    71351
2023-02-13 17:20:31,266 [amqEmbeddedWorkerJmsContainer-7] [uk.ac.ebi.interpro.scan.io.superfamily.match.SuperFamilyHmmer3MatchParser:181] WARN - Ignoring line with unexpected format: 4184 0043429 3-104   2.90e-26    127 LSVGTMIAGWDETGPGLYYVDSEGGRLKGMRFSVGSGSPYACGVLDNGYRY----DMSVEEAAELARRSIYHATFRDGASGSVASVYYVGPNGWKKLSGDDVADL1.90e-20   71351
2023-02-13 17:20:31,267 [amqEmbeddedWorkerJmsContainer-7] [uk.ac.ebi.interpro.scan.io.superfamily.match.SuperFamilyHmmer3MatchParser:181] WARN - Ignoring line with unexpected format: 4598 0043429 77-146  7.50e-12    155 EDFVVAGTASESLYGACEAMFKP----DMEAEELFETISQALLSSVDRDCLSGWGGHVYVVTPTEVTERILKGR  1.70e-20    71351
2023-02-13 17:20:31,281 [amqEmbeddedWorkerJmsContainer-7] [uk.ac.ebi.interpro.scan.io.superfamily.match.SuperFamilyHmmer3MatchParser:181] WARN - Ignoring line with unexpected format: 11301    0043429 4-222   3.30e-67    1IGTGYDLSVTTFSPDGRVFQIEYAAKAVDNTGTVVGIKCTDGIVMGVEKLIASKMMLpgSNRRIHSVHRHSGMAVAGLAADGRQIVARAKAEATNYESVYGEPIPVKELAERVASYVHLCTLYWWLRPFGCGIILGGYDRGGPQLYMVEPSGISYRYFGAAIGKGKQAAKTEIEKLKLS----EMTCREGIIEIAKIIYKVHDEAKDKAFELEMSWICDESKR    5.89e-20    71351
2023-02-13 17:20:34,018 [amqEmbeddedWorkerJmsContainer-5] [uk.ac.ebi.interpro.scan.io.superfamily.match.SuperFamilyHmmer3MatchParser:181] WARN - Ignoring line with unexpected format: 7285 0043429 7-238   3.50e-68    3AGYDRHITIFSPEGRLFQVEYAFKAVKSAGiTSIGVRGKDSVCVVTQKKVPDKLLDdaCVTHLFPITKYIGLLATGMTADARSLVQQARNEAAEFRFRYGYEMPVDVLARWIADKAQVYTQHAYMRPLGVVAMVLSVDEEkGPQLFKCDPAGHFYGHKATSAGLKEQEAINFLEKKMKNDP--AFSYEETVQTAISALQSVLQED-FKATEIEVGVVRKENTvFRVLSTEEIDEH    6.41e-20    71351
2023-02-13 17:20:34,019 [amqEmbeddedWorkerJmsContainer-5] [uk.ac.ebi.interpro.scan.io.superfamily.match.SuperFamilyHmmer3MatchParser:181] WARN - Ignoring line with unexpected format: 7636 0043429 5-224   5.30e-72    2RTEYDRGVNTFSPEGRLFQVEYAIEAIKLGSTAIGLKTKEGVVLAVEKRITSPLLEpsSVEKIMEIDEHIGCAMSGLIADARTLVEHARVETQNHRFSYGEPMTVESTTQALCDLALRFGegdEESMSRPFGVSLLIAGHDENGPSLYYTDPSGTFWQCNAKAIGSGSEGADSSLQEQYNK----DITLQEAETIALSILKQVMEEK-VTPNNVDIAKVSPTYH--L    1.02e-19    71351
2023-02-13 17:20:40,489 [amqEmbeddedWorkerJmsContainer-3] [uk.ac.ebi.interpro.scan.io.superfamily.match.SuperFamilyHmmer3MatchParser:181] WARN - Ignoring line with unexpected format: 17435    0043429 7-202   1.30e-44    3NGSALVAMVGKNCFAIASDRRLGVQLQTiatDFQRIYKIHDKLFIGLAGLATDAQTLYQKLVFRHKLYQLREERDMKPQTFASLVSALLYEKRFG----PFFCQPVIAGLGDEdKPFICTMDSIGAKElAKDFVVAGTASESLYGACEAMFKP----DMEAEELFETISQALLSSVDRDCLSGWGGHVYVVTPTEVTERILKGR   2.03e-20    71351
2023-02-13 17:20:40,499 [amqEmbeddedWorkerJmsContainer-3] [uk.ac.ebi.interpro.scan.io.superfamily.match.SuperFamilyHmmer3MatchParser:181] WARN - Ignoring line with unexpected format: 8654 0043429 12-205  1.50e-49    2HSMGTTIIGVTYNGGVVLGADSRTSTGMYVanrASDKITQLTDNVYLCRSGSAADSQIVSDYVRYYLHQHTIQLGQPATVKVAANLVRLISYNNKNR-----LQTGMIVGGWDKYeGGKIYGVPLGGTVIEQPFAIGGSGSSYLYGFFDQEWKD----GMTKDEAEKLVVKAVSLAIARDGASGGVVRTVIINSEGVTRNFYP    2.60e-20    71351
2023-02-13 17:20:40,508 [amqEmbeddedWorkerJmsContainer-3] [uk.ac.ebi.interpro.scan.io.superfamily.match.SuperFamilyHmmer3MatchParser:181] WARN - Ignoring line with unexpected format: 3034 0043429 7-165   3.80e-46    3AGYDRHITIFSPEGRLFQVEYAFKAVKSAGiTSIGVRGKDSVCVVTQKKVPLWMILggGSPTCSLLRSTIGLLATAMTADARSLVQQARNEAAEFRFRYGYEMPVDVLARWIADKAQVYTQHAYMRPLGVVAMVLSVDEEkGPQLFKCDPAGHFYGHKV    3.90e-20    71351
2023-02-13 17:20:40,510 [amqEmbeddedWorkerJmsContainer-3] [uk.ac.ebi.interpro.scan.io.superfamily.match.SuperFamilyHmmer3MatchParser:181] WARN - Ignoring line with unexpected format: 3827 0043429 7-202   4.60e-45    3NGSALVAMVGKNCFAIASDRRLGVQLQTiatDFQRIYKIHEKLFIGLSGLATDAQTLYQRLVFRHKLYQLREERNMKPETFASLVSAILYEKRFG----PYFCQPVIAGLGDDnKPFICTMDSIGAKElAKDFVVAGTASESLYGACEAMFKP----DMEAEELFETVSQALLSSVDRDCLSGWGGHVYVVTPTEVTERILKGR   2.03e-20    71351
2023-02-13 17:20:40,515 [amqEmbeddedWorkerJmsContainer-3] [uk.ac.ebi.interpro.scan.io.superfamily.match.SuperFamilyHmmer3MatchParser:181] WARN - Ignoring line with unexpected format: 392  0043429 4-222   4.30e-68    1IGTGYDLSVTTFSPDGRVFQIEYAAKAVDNSGTVVGIKCKNGIVMGVEKLIASKMMLpgSNRRIHSVHRHSGMAVAGLAADGRQIVARAKAEATNYESVYGEPIPVKELAERVASYVHLCTLYWWLRPFGCGIILGGYDREGPQLYMVEPSGISYRYFGAAIGKGKQAAKTEIEKLKLS----EMTCREGIIEIAKIIYKVHDEAKDKAFELEMSWICDESKR    6.00e-20    71351
2023-02-13 17:20:40,517 [amqEmbeddedWorkerJmsContainer-3] [uk.ac.ebi.interpro.scan.io.superfamily.match.SuperFamilyHmmer3MatchParser:181] WARN - Ignoring line with unexpected format: 1244 0043429 29-221  2.00e-27    3TGTSVIGIKYKDGILMVADMGGSYGSTLrykSVERMKPVGKHSLLGASGEISDFQEILRYLDELILYDNMWDDgNSLGPKEVHNYLTRVMYNRRNK--FNPLWNSLVLGGVKNGQKYLGMVSMIGVHFEDNHVATGFGNHLARPILRDEWHE----NLSFEDGVKLLEKCMRVLLYRDRSAVNKLQIAKITEGVWTFRS    1.77e-20    71351
2023-02-13 17:20:40,517 [amqEmbeddedWorkerJmsContainer-3] [uk.ac.ebi.interpro.scan.io.superfamily.match.SuperFamilyHmmer3MatchParser:181] WARN - Ignoring line with unexpected format: 1684 0043429 596-790 4.70e-52    2FLKTGTTIVGLIFQDGVILGADTRATEGPIVadkNCEKIHYMAPNIYCCGAGTAADTEAVTDMVSSQLQLHRYHTGRESRVVTALTLLKSHLFSYQGY-----VQAALVLGGVDVTGPHLHTIYPHGSTDTLPFATMGSGSLAAMAVFESKYRE----GLTRDEGVKLVTEAICSGIFNDLGSGSNVDVCVITKGQTEYLRNHQ   2.26e-20    71351
2023-02-13 17:20:40,541 [amqEmbeddedWorkerJmsContainer-3] [uk.ac.ebi.interpro.scan.io.superfamily.match.SuperFamilyHmmer3MatchParser:181] WARN - Ignoring line with unexpected format: 22766    0043429 54-251  3.70e-57    3KGTTTLAFIFKEGVMVAADSRASMGGYIssqSVKKIIEINPYMLGTMAGGAADCQFWHRNLGIKCRLHELANKRRISVTGASKLLANILYNYRGM----GLSVGTMIAGWDETGPGLYYVDSEGGRLKGMRFSVGSGSPYAYGVLDNGYRY----DMSVEEAAELARRSIYHATFRDGASGGVASVYYVGPDGWKKLSGDDVAELH 2.47e-20    71351
2023-02-13 17:20:40,790 [amqEmbeddedWorkerJmsContainer-3] [uk.ac.ebi.interpro.scan.io.superfamily.match.SuperFamilyHmmer3MatchParser:181] WARN - Ignoring line with unexpected format: 14031    0043429 498-716 1.30e-70    2RTEYDRGVNTFSPEGRLFQVEYAIEAIKLGSTAIGLKTKEGVVLAVEKRITSPLLEpsSVEKIMEIDEHIGCAMSGLIADARTLVEHARVETQNHRFSYGEPMTVESTTQALCDLALRFGegdEESMSRPFGVSLLIAGHDENGPSLYYTDPSGTFWQCNAKAIGSGSEGADSSLQEQYNK----DITLQEAETIALSILKQVMEEK-VTPNNVDIAKVSPTYH   1.02e-19    71351
2023-02-13 17:20:40,792 [amqEmbeddedWorkerJmsContainer-3] [uk.ac.ebi.interpro.scan.io.superfamily.match.SuperFamilyHmmer3MatchParser:181] WARN - Ignoring line with unexpected format: 15108    0043429 12-205  1.60e-49    2HSMGTTIIGVTYDGGVILGADSRTSTGMYVanrASDKITKLTDNIYLCRSGSAADSQIVSDYVRYYLHQHTIRLGQPATVKVAANLVRLISYNNKNM-----LQTGLIVGGWDKYnGGKIYGVPLGGTIIEQPFAIGGSGSTYLYGFFDQEWKD----GMTKDEAEKLVVKAVSLAIARNGASGGVVRTVIINSEGVTRNFYP    2.56e-20    71351
2023-02-13 17:20:40,792 [amqEmbeddedWorkerJmsContainer-3] [uk.ac.ebi.interpro.scan.io.superfamily.match.SuperFamilyHmmer3MatchParser:181] WARN - Ignoring line with unexpected format: 15115    0043429 54-251  2.10e-57    3KGTTTLAFIFKEGVMVAADSRASMGGYIssqSVKKIIEINPYMLGTMAGGAADCQFWHRNLGIKCRLHELANKRRISVTGASKLLANILYNYRGM----GLSVGTMIAGWDETGPGLYYVDSEGGRLKGMRFSVGSGSPYAYGVLDNGYRY----DMSVDEAAELARRAIYHATFRDGASGGVASVYYVGPDGWKKLSGDDVAELH 2.58e-20    71351
2023-02-13 17:20:40,793 [amqEmbeddedWorkerJmsContainer-3] [uk.ac.ebi.interpro.scan.io.superfamily.match.SuperFamilyHmmer3MatchParser:181] WARN - Ignoring line with unexpected format: 15565    0043429 3-93    1.20e-22    4RYDRAITVFSPDGHLFQVEYALEAVRKGNAAVGVRGTDTIVLGVEKKSTAKLQDsrSVRKIVNWITHC-IGWRGSKQHARVLINRASIECQR   3.53e-20    71351
2023-02-13 17:20:40,793 [amqEmbeddedWorkerJmsContainer-3] [uk.ac.ebi.interpro.scan.io.superfamily.match.SuperFamilyHmmer3MatchParser:181] WARN - Ignoring line with unexpected format: 15767    0043429 14-224  3.70e-23    2NGGTCGFCYCGCQFCVLAARHSTVLVTVIliltrDYSKICKLADKSVMASSGFQADVRALQKVLAARHLVRIQNFCFLNFILIVLHKVLDVGGGLFFFSfLCWCLYAFNVLGGLDNEGkGCVFTYDAVGSYERVGYSSQGSGSTLIMPFLDNQLKSPSPLllpakdavtPLSEPEAIDLVKTCFASATERDIYTGDKLEIVVLNADGIRSE    1.81e-20    71351
2023-02-13 17:20:40,801 [amqEmbeddedWorkerJmsContainer-3] [uk.ac.ebi.interpro.scan.io.superfamily.match.SuperFamilyHmmer3MatchParser:181] WARN - Ignoring line with unexpected format: 26061    0043429 3-113   4.60e-22    2REPYDTDSMTWSPLGQL----YAMKAVKQGAEAIGLRSKTHVVLVSVNKVQSRLSEHQTKIFEVDDYI-----GVASDGSFLSRLLRSECINYSYTYGSPIPVGRLIVRIADKAQEPRAN   5.27e-20    71351
2023-02-13 17:20:40,802 [amqEmbeddedWorkerJmsContainer-3] [uk.ac.ebi.interpro.scan.io.superfamily.match.SuperFamilyHmmer3MatchParser:181] WARN - Ignoring line with unexpected format: 26555    0043429 34-157  4.90e-23    5SGQLIarSVRKIVNLDDHIALACTGLKADARVLINRARIECQSHRLTVEDP--------------------------------------------TDPSGTFSAWKANATGRNSNSVREFLEKNYKE-----PSGQETVKLAICALLEVVE---SGGKNIEVAVMTKEHGLWQLDE   2.54e-20    71351
2023-02-13 17:20:40,803 [amqEmbeddedWorkerJmsContainer-3] [uk.ac.ebi.interpro.scan.io.superfamily.match.SuperFamilyHmmer3MatchParser:181] WARN - Ignoring line with unexpected format: 26956    0043429 3-170   2.60e-39    3CVFGLVGKGFALVVADTSAVHSILLhkcNEDKIMVLDSHKLMGASGEAGDRAQFTEYIQKNVALYQFRNGIPLTTAAAANFTRGELATALRKN---PYSVNILLAGYDKEtGPSLYYIDYISTLHKVDKAAFGYGSYFSLAMMDRHYRS----DMSLEEAVDLVDKCIMETGVRL    2.27e-20    71351
2023-02-13 17:20:40,805 [amqEmbeddedWorkerJmsContainer-3] [uk.ac.ebi.interpro.scan.io.superfamily.match.SuperFamilyHmmer3MatchParser:181] WARN - Ignoring line with unexpected format: 27726    0043429 13-145  9.30e-29    7TAADSQIVSDYVRYYLHQHTIQLGQPATVKVAANLVRLLSYNNKNR-----LQTGMIVGGWDKYeGGKIYGVPLGGTVIEQPFAIGGSGSSYLYGFFDQEWKD----GMTKDEAEKFVVKAVSLAIARDGASGGVVRTVIVS 2.22e-20    71351
2023-02-13 17:20:40,815 [amqEmbeddedWorkerJmsContainer-3] [uk.ac.ebi.interpro.scan.io.superfamily.match.SuperFamilyHmmer3MatchParser:181] WARN - Ignoring line with unexpected format: 24612    0043429 74-244  3.70e-33    4CVCVVTQKKVADKLLDqtSVTHLFPITKYLGLLATGMTADARTLVQQARNESAEFRFRYGHEMPVDVLARwyMIADKSQVYTQHAYMRPLGRFVdMVLGIDDEfGPRLFKCDPAGHFFGHKATSAGLKEQEAINFLEKKMKN---------------------------DSAFSYEVGVVRKENPaFSVLSAEEIDEH 3.26e-20    71351
2023-02-13 17:20:40,817 [amqEmbeddedWorkerJmsContainer-3] [uk.ac.ebi.interpro.scan.io.superfamily.match.SuperFamilyHmmer3MatchParser:181] WARN - Ignoring line with unexpected format: 25234    0043429 3-194   2.70e-43    3CVFGLVGKGFALVVADTSAVHSILLhktNEDKIMVLDSHKLMGASGEAGDRAQFTEYIQKNVALYQFRNGIPLTTAAAANFTRGELATALRKN---PYSVNILLAGYDKEtGPSLYYIDYISTLHKVDKAAFGYGSYFSLSMMDRHYHS----DMSLEEAVDLVDKCIIEIRSRLVIAPPNFVIKIVDQDGAREHAWRE    2.36e-20    71351
2023-02-13 17:20:40,817 [amqEmbeddedWorkerJmsContainer-3] [uk.ac.ebi.interpro.scan.io.superfamily.match.SuperFamilyHmmer3MatchParser:181] WARN - Ignoring line with unexpected format: 25327    0043429 4-208   1.80e-63    3SQYSFSLTTFSPSGKLVQIEHALTAVGSGQTSLGIKAANGVIIATEKKLPSILVDesSVQKIQILTPNIGVVYSGMGPDSRVLVRKSRKQAEQYHRLYKEPIPVTQLVRETAAVMQEFTQSGGVRPFGVSLLVAGFDDKGPQLYQVDPSGSYFSWKASAMGKNVSNAKTFLEKRYTD---------------------------ISEKNIEIGIIGNDRKFRVLTPSEIADY   6.44e-20    71351
2023-02-13 17:20:40,845 [amqEmbeddedWorkerJmsContainer-3] [uk.ac.ebi.interpro.scan.io.superfamily.match.SuperFamilyHmmer3MatchParser:181] WARN - Ignoring line with unexpected format: 35077    0043429 47-278  1.10e-67    3AGYDRHITIFSPEGRLFQVEYAFKAVKAAGiTSIGVRGKDSVCVVTQKKVPDKLLDqtSVTHLFPITKYLGLLATGMTADARTLVQQARNEAAEFRFRYGYEMPVDVLARWIADKSQVYTQHAYMRPLGIVAMVLGIDDEfGPRLFKCDPAGHFFGHKATSAGLKEQEAINFLEKKMKNDP--AFSYEETVQTAISALQSVLQED-FKASEIEVGVVCKENPaFRVLSTEEIDEH    6.80e-20    71351
2023-02-13 17:20:41,109 [amqEmbeddedWorkerJmsContainer-3] [uk.ac.ebi.interpro.scan.io.superfamily.match.SuperFamilyHmmer3MatchParser:181] WARN - Ignoring line with unexpected format: 12001    0043429 3-191   4.70e-43    3CVFGLVGKGFALVVADASAVHSILLhksNEDKIMILDSHKLMGASGEAGDRAQFTEYIQKNVALYQFRNGIPLTTAAAANFTRGELATALRKS---PYFVNIILAGYDKEtGPSLYYIDYIASMHKVDKAAFGYGSYFSLAMMDRHYHS----NMTLEEAIDLVDKCIIEIQSRLVVAPPNFVIKIVDQDGAREYG   2.20e-20    71351
2023-02-13 17:20:41,109 [amqEmbeddedWorkerJmsContainer-3] [uk.ac.ebi.interpro.scan.io.superfamily.match.SuperFamilyHmmer3MatchParser:181] WARN - Ignoring line with unexpected format: 12120    0043429 42-96   6.20e-05    148 QVGTLFEYQAFAIGSRSQAAKTYLECTYESFV--RSLREDLLKDALFELRETSQGEK   2.47e-20    71351
2023-02-13 17:20:41,110 [amqEmbeddedWorkerJmsContainer-3] [uk.ac.ebi.interpro.scan.io.superfamily.match.SuperFamilyHmmer3MatchParser:181] WARN - Ignoring line with unexpected format: 12596    0043429 7-202   5.30e-45    3NGSAIVAMVGKNCFAIASDRRLGVQLQTiatDFQRIYKIHDKLFIGLAGLATDAQTLYQKLVFRHKLYQLREERDMKPQTFASLVSALLYEKRFG----PFFCQPVIAGLGDEdKPFICTMDSIGAKElAKDFVVAGTASESLYGACEAMFKP----DMEAEELFETISQALLSSVDRDCLSGWGGHVYVVTATEVTERILKGR   2.03e-20    71351
2023-02-13 17:20:41,110 [amqEmbeddedWorkerJmsContainer-3] [uk.ac.ebi.interpro.scan.io.superfamily.match.SuperFamilyHmmer3MatchParser:181] WARN - Ignoring line with unexpected format: 12669    0043429 8-215   2.20e-45    1SP------------YDNNGGTCVAIAGADYCVIAADTRMSTGYSIltrDYSKICQLAEKSVMASSGFQADVRALQKVLAARHLIYQHQHNKQMSCPAMAQLLSNTLYYKRFF----PYYAFNVLGGLDNEGkGCVFTYDAVGSYERVGYSSQGSGSTLIMPFLDNQLKSPSPLllpaqdavtPLSELEAIDLVKTCFASATERDIYTGDKLEIVVLNADGIRRE   2.00e-20    71351
2023-02-13 17:20:41,113 [amqEmbeddedWorkerJmsContainer-3] [uk.ac.ebi.interpro.scan.io.superfamily.match.SuperFamilyHmmer3MatchParser:181] WARN - Ignoring line with unexpected format: 13959    0043429 187-381 3.40e-51    2FLKTGTTIVGLIFQDGVILGADTRATEGPIVadkNCEKIHYMAPNIYCCGAGTAADTEAVTDMVSSQLQLHRYHTGRESRVVTALTLLKSHLFSYQGY-----VQAALVLGGVDVTGPHLHTIYPHGSTDTLPFATMGSGSLAAMAVFESKYHE----GLTRDGGVRLVTEAICSGIFNDLGSGSNVDVCVITKGQTEYLRNHQ   2.15e-20    71351
13/02/2023 17:20:56:302 50% completed
13/02/2023 17:23:38:277 75% completed
13/02/2023 17:26:07:207 90% completed
13/02/2023 17:29:08:771 100% done:  InterProScan analyses completed 

2023-02-13 17:29:09,273 [main] [uk.ac.ebi.interpro.scan.jms.main.Run:1801] WARN - deleteWorkingDirectoryOnCompletion : false

When checking the output, some of the proteins are missing there, which are actually the ones I'm mostly interested in (predicted nucleotide-binding leucine-rich repeat proteins). Not sure, whether it is a bug or not, but I was unable to find out what the 'unexpected format' actually means and how to change it so that the tool doesn't exclude those sequences from the analysis.

For reproducibility of the warning messages, the protein sequences I'm using can be obtained here: https://kiwifruitgenome.org/ftp/A_chinensis/Hongyang/v3.0/Hongyang_pep_v3.0.fa.gz

PS: On test data everything works OK and I don't get any warning messages.

webbchen commented 1 year ago

A possibly similar error is appearing in the test-set run, it seems to dislike the raw PIRSF-output.

./interproscan.sh -i test_all_appl.fasta -f tsv -dp
15/02/2023 12:19:48:774 Welcome to InterProScan-5.60-92.0
15/02/2023 12:19:48:777 Running InterProScan v5 in STANDALONE mode... on Linux
15/02/2023 12:19:56:983 RunID: n19-32-192-crossbones.hpc.hutton.ac.uk_20230215_121956651_5puw
15/02/2023 12:20:09:936 Loading file /mnt/shared/scratch/awebb/apps/interproscan/interproscan-5.60-92.0/test_all_appl.fasta
15/02/2023 12:20:09:937 Running the following analyses:
[AntiFam-7.0,CDD-3.20,Coils-2.2.1,FunFam-4.3.0,Gene3D-4.3.0,Hamap-2021_04,MobiDBLite-2.0,PANTHER-17.0,Pfam-35.0,PIRSF-3.10,PIRSR-2021_05,PRINTS-42.0,ProSitePatterns-2022_01,ProSiteProfiles-2022_01,SFLD-4,SMART-7.1,SUPERFAMILY-1.75,TIGRFAM-15.0]
Pre-calculated match lookup service DISABLED.  Please wait for match calculations to complete...
15/02/2023 12:20:30:614 25% completed
2023-02-15 12:20:39,478 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:129] WARN - Couldn't parse the given raw match line, because it is of an unexpected format.
2023-02-15 12:20:39,492 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:130] WARN - Unexpected Raw match line: Query sequence: 1 matches PIRSF001789: Nerve growth factor, subunit beta
2023-02-15 12:20:39,492 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:129] WARN - Couldn't parse the given raw match line, because it is of an unexpected format.
2023-02-15 12:20:39,492 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:130] WARN - Unexpected Raw match line: #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali to    envfrom  env to     acc
2023-02-15 12:20:39,492 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:129] WARN - Couldn't parse the given raw match line, because it is of an unexpected format.
2023-02-15 12:20:39,492 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:130] WARN - Unexpected Raw match line: 1 !  339.3   1.4  1.1e-105  3.5e-102       1     252 [.       1     256 [.       1     257 [] 0.97
2023-02-15 12:20:39,492 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:129] WARN - Couldn't parse the given raw match line, because it is of an unexpected format.
2023-02-15 12:20:39,492 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:130] WARN - Unexpected Raw match line: Query sequence: 3 matches PIRSF001220: L-asparaginase/Glutamyl-tRNA(Gln) amidotransferase subunit D
2023-02-15 12:20:39,493 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:129] WARN - Couldn't parse the given raw match line, because it is of an unexpected format.
2023-02-15 12:20:39,493 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:130] WARN - Unexpected Raw match line: #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali to    envfrom  env to     acc
2023-02-15 12:20:39,493 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:129] WARN - Couldn't parse the given raw match line, because it is of an unexpected format.
2023-02-15 12:20:39,493 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:130] WARN - Unexpected Raw match line: 1 !  296.3   3.4   4.8e-92   5.3e-89       3     323 ..      48     365 ..      46     370 .] 0.96
2023-02-15 12:20:39,493 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:129] WARN - Couldn't parse the given raw match line, because it is of an unexpected format.
2023-02-15 12:20:39,493 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:130] WARN - Unexpected Raw match line: and matches Sub-Family PIRSF500176: L-asparaginase/L-glutaminase
2023-02-15 12:20:39,493 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:129] WARN - Couldn't parse the given raw match line, because it is of an unexpected format.
2023-02-15 12:20:39,493 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:130] WARN - Unexpected Raw match line: #    score  bias  c-Evalue  i-Evalue hmmfrom  hmm to    alifrom  ali to    envfrom  env to     acc
2023-02-15 12:20:39,493 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:129] WARN - Couldn't parse the given raw match line, because it is of an unexpected format.
2023-02-15 12:20:39,493 [amqEmbeddedWorkerJmsContainer-6] [uk.ac.ebi.interpro.scan.io.pirsf.hmmer3.PirsfHmmer3RawMatchParser:130] WARN - Unexpected Raw match line: 1 !  252.9   3.1   8.3e-79   9.1e-76       3     324 ..      50     367 ..      48     370 .] 0.91
15/02/2023 12:20:47:882 50% completed
15/02/2023 12:21:21:467 75% completed
15/02/2023 12:21:59:690 90% completed
15/02/2023 12:22:58:830 100% done:  InterProScan analyses completed

some Pirsf-lines do appear, though, in the test output, although I can't tell if that's what's supposed to be there and if it's all of it:

 grep PIRSF test_all_appl.fasta_1.tsv
# yields:
UPI0004FABBC5   92e4b89dd86f8ab828f57121f6d7d460        257     PIRSF   PIRSF001789     NGFB    1       257     3.5E-102        T       15-02-2023      IPR020408       Nerve growth factor-like
UPI0002E0D40B   f91cb3cf61f2d7c7f5aaf6ea04e07868        370     PIRSF   PIRSF001220     L-ASNase_gatD   46      370     5.3E-89 T       15-02-2023      IPR006034       Asparaginase/glutaminase-like
UPI0002E0D40B   f91cb3cf61f2d7c7f5aaf6ea04e07868        370     PIRSF   PIRSF500176     L_ASNase        48      370     9.1E-76 T       15-02-2023      -       -

The test run without the -dp flag completed without any issues.

all installed and run on an HPC, OS: Rocky Linux 8.

Tsylvester8 commented 1 year ago

Maybe try this solution posted at https://github.com/ebi-pf-team/interproscan/issues/173#issuecomment-1412069679. I had a similar warning with the test dataset, and I could run InterProScan on the test set without any warnings after following their solution.

Rooksie commented 1 year ago

The above mentioned solution worked for me after getting the same error as the poster.

Thank you!

marketavlkova commented 1 year ago

I tried the solution as suggested by @Tsylvester8 (adding pirsf.pl.binary.switches=--outfmt i5 to interproscan.proporties and re-running python3 setup.py interproscan.properties), but I'm still getting the same errors on my dataset and I didn't have any warnings/errors/problems running the test data to begin with.

I also created a new conda interproscan environment just in case the python3 setup.py interproscan.properties command doesn't overwrite previous setup. The "Ignoring line with unexpected format" warnings persist.

marketavlkova commented 1 year ago

I solved the problem by installing an older InterProScan version not using conda. I suspect the conda installation might be the issue here, but could be version specific too as I described here: https://github.com/slt666666/NLRtracker/issues/14#issuecomment-1443414467.