ebi-pf-team / interproscan

Genome-scale protein function classification
Apache License 2.0
303 stars 67 forks source link

SignalP fails to run because default batch size 16000 greater than default max sequences 10000 in SignalP script #51

Closed sujaikumar closed 6 years ago

sujaikumar commented 6 years ago

Discovered a weird problem today:

By default, the SignalP perl script that one downloads from http://www.cbs.dtu.dk/services/SignalP/ has max sequences set to 10,000:

# max number of sequences per run (any number can be handled)
my $MAX_ALLOWED_ENTRIES=10000;

However, the default batch size for SignalP in interproscan.properties is 16,000:

analysis.max.sequence.count.SIGNALP=16000

As a result, when I submit a file with, say 20,000 sequences, then it gets split into two batches (1-16000, and 16001-20000) but only the second one finishes as the first one has too many sequences for SignalP to process.

Fixing either my $MAX_ALLOWED_ENTRIES=10000000; or analysis.max.sequence.count.SIGNALP=10000 solves the problem.

May I request that future editions of interproscan.properties have analysis.max.sequence.count.SIGNALP=10000 and a warning to check that this is <= the limit in the SignalP script?

gsn7 commented 6 years ago

the new release has the cutoff updated. thanks