fteufel / signalp-6.0

Multi-class signal peptide prediction and structure decoding model.
https://services.healthtech.dtu.dk/service.php?SignalP-6.0
Other
82 stars 15 forks source link

Unknown sequence behaviour #7

Closed J-Calvelo closed 8 months ago

J-Calvelo commented 2 years ago

Hello, I got the following warning messages running signalp6 with "--organism euk " and got these warnings:

Unknown behaviour encountered for sequence no. 12101. Please check outputs.
Unknown behaviour encountered for sequence no. 16029. Please check outputs.
Unknown behaviour encountered for sequence no. 25368. Please check outputs.

Are they sequence numbers on the original file? If so these are the sequences:

>BSUD.16781.1.p1
MISLLTTFFLLLSPKVAGDCYGDTIARQKRLLGEMLDMSPAILSMEKLHADSVQQTLHEM
EIFQKYQFAEITPYEYKKLLLSRYLVFSASFALNICNQYGARLVEIMEEDERKAIAVMLD
LSDTPVDCLVIGTRFDKGDWTYWYSTRPAYRTLSTEENQQKGNCMILENQKSWNMTRVPC
LRTYFCHFMCEISMK*

>BSUD.18903.3.p1
MKPDHHAKMAAQMKERLKVEELAENIDELEDVVENAFPVLLVTVLVSIFLLAIFLVRMYL
RYTVENPSKNRMDGKTVLITGATSGLGKATAIELARKNARVLITGRDKIKVEAVARNIRK
KTGNQHVNALVLDLANLRGIREFCEAFCKDEKYLHVLINNAAYMGPKAATDDNLERCFGV
NYLGHFYLTYLLSDKLKKNAPSRVINVVSDSYAIGQLDFDDIALNKGYDVFKAYARSKYA
MMLWNLEHHRRTYSSCIWTFAVHPGACATELLRNYPGLTGNLLRIVSRIMFKAPEDGCQT
IVYLAVADGLREFSGKTFANCKVIKTQDRIKDKEVAKELWNISAHLCGFEPDTPYEEQES
TEAKETTTSDSPTADIAAAAAVSEQKKDK*

>BSUD.2410.1.p1
MKNRPSAAFRASAKPPTYCKMESQKEDEEDDGKGSRTMVLAGGGDGSNTEGAVPAKGGCG
EGRVLIFLVLGVLTLVFSGVLIGIYMNIRTLTSSLDVIEVMPSFVPAAAGGLAGLFLLGL
FWKRCVVLVYPVLVLCAVSTGLSIIIAVLTGTHVLQPLLSVSGCVYTRKGNICQCLTQFK
RDKLDLERVNAGETVYLALHNVSSCEDVQTVIPTMLYTMIGIYGLLALVSAVAGIISFLV
YRTERNRNYLDDTDYDEDEDSSPSTPSSNTDNYTEHQNMLSSRQANVTTAASVGNIYTNT
NATTNDEETNNDDGNTTPSDLTYNPSDAPMGYTEACKMRRCQSFTQPHKGAAREGSPGSS
ESGQTASSLMSSDRVADGGAIRLKENRKKGRRAVTLHGLDRDQLLLILSLQMRYLQESEQ
LAKKECQSALNLNNINKPNRTNVKNPNATSSDTSQENIDTSNTFSHFQRRAMTPTPRQER
LSSSRATNDDLDYKPAKQVRSHTPQPYHFKVQHNGVPATMGPVLLPNIPAQYQVEQQLQP
LQVQHLQQQQLQQQQQFLQFQQQQQLAQLQQQQQKIQMQQQLQLQQHLQVQQMSQHLPMQ
QPILVQQPALQSANHNMENESLTMITYDLRSVQTTGPIVYENVPSSRTSLNYYSPDGSCS
SNSSSLLRATNLPGYPSLSSNSVPTQTQPMPQPVESSPIKVLGGNIQTTTPNSPGQISRQ
TSEISSPSHSGNDSINTEQKQPGKSPESTPETQPAKAKGKKKLSKKEKNAKKEEEKTATT
DSTKTSTLGRTDSNASKAGSVSDRWQAVLPDGKAQAQTLWENVQRKIVSDPQTPDSTLTF
PHSSHPTSQTSVQPSAPNHFPNSSLVVPNGILKKTKSVPYSQQSSPNLAPVSPQSLPQFD
SQVPSNIYSSYNQMTNTPSYNAFIPIPTSSTDNYEDIDDFSRANQHAQQVQQDVPPPKPA
RLHARKPAPGAEAGDTLDSQGQQLRPKSYLSAVDRESMAAASLTSMGNVPCQPAYEGTNK
QSLTHMQMEGPVRDERSGGIVPAGGNDQLIMIYGDLYAQPRRKSIPTNLPLLPSLNQSHQ
MYQNSDIDHNLMDNQARAQYRLDVHQPQRSAFHMLGHTYHGDRSGHLPSNEICDTDLDEL
PIPRWNSRYHRSQSFSPPPYTPPPVYQSLESVGKYPSIRSTSSSSSDPHNSSSEGSTLDN
IQPGFTNNRLQGPPLSMSRGRQPAHVTDRRFCLTQQRQPPPNLYNHSGRRLNLKSDYDSF
RDRRVDQEPPPQVAIRRCQSVEEGNRKRLTSGQHLVNGKMVPHNNYYPNIDNIRTTSDGQ
NLENNLKMGPSVNRVRPIHNGSVPNTEQPAFQKGVQNFQNEAYIPNKGCQARKFPGDVSE
EDLSCSIDTDSVISDSSSQEVCPNKELNGFITHGRALESSDSDKDDYAETVI*

What is the cause? Thanks

fteufel commented 2 years ago

Hi,

SignalP 6 is essentially a model that predicts two things at the same time: a) the type of the signal peptide b) the region structure and the cleavage site, i.e. the label at each sequence position.

Region structures are different for each type. Sometimes it happens that the model predicts a type, but a region structure of a different type. (E.g. it predicts a Sec/SPI signal peptide, but a region structure of type "No SP" in the eukarya case). The model is a neural network, so it is hard to tell when and why exactly it happens. In most of the cases our prediction post-processing handles it, but some still get missed. We are still working out a fix that completely prevents it from happening.

For now, the tool gives you the warning (previous versions just crashed).

From a user perspective, you can always trust the type predictions, the warning does not matter there. If you also want to use the region structure and cleavage sites, I recommend you manually look at the probability curves of the affected sequences to see if they look ok.

Hope this helps for now until we have a fix.

The numbers refer to the original file, yes. But they are in Python format, meaning the first sequence in the file is 0, the 2nd 1 and so on. Can be confusing, I guess I should also change that in the next update.

J-Calvelo commented 2 years ago

It does, thanks!