fteufel / signalp-6.0

Multi-class signal peptide prediction and structure decoding model.
82 stars 15 forks source link

Numpy error with 6.0b for some sequences #1

Closed darcyabjones closed 2 years ago

darcyabjones commented 2 years ago

Hi there!

Thanks for your great work. I was testing out the update and came along an issue while running on the signalp5 benchmark set during the marginal conflict resolution step.

This sequence:


from https://services.healthtech.dtu.dk/services/SignalP-6.0/public_data/benchmark_set_sp5.fasta appears to be the issue.

Running version 6.0b in "fast" mode with this sequence in both other and eukaryote organisms causes the following error.

$ signalp6 --output_dir test --format txt --organism euk --mode fast --fastafile test.fasta

/home/ubuntu/miniconda3/envs/sp6/lib/python3.6/site-packages/torch/nn/modules/module.py:1051: UserWarning: where received a uint8 condition tensor. This behavior is deprecated and will be removed in a future version of PyTorch. Use a boolean condition instead. (Triggered internally at  /tmp/pip-req-build-1ky46svp/aten/src/ATen/native/TensorCompare.cpp:255.)
  return forward_call(*input, **kwargs)
Predicting: 100%|| 1/1 [00:00<00:00,  1.53batch/s]
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/sp6/bin/signalp6", line 8, in <module>
  File "/home/ubuntu/miniconda3/envs/sp6/lib/python3.6/site-packages/signalp/__init__.py", line 6, in predict
  File "/home/ubuntu/miniconda3/envs/sp6/lib/python3.6/site-packages/signalp/predict.py", line 235, in main
    resolve_viterbi_marginal_conflicts(global_probs, marginal_probs, cleavage_sites, viterbi_paths)
  File "/home/ubuntu/miniconda3/envs/sp6/lib/python3.6/site-packages/signalp/utils.py", line 254, in resolve_viterbi_marginal_conflicts
    cleavage_sites[i] = sp_idx.max() +1
  File "/home/ubuntu/miniconda3/envs/sp6/lib/python3.6/site-packages/numpy/core/_methods.py", line 39, in _amax
    return umr_maximum(a, axis, None, out, keepdims, initial, where)
ValueError: zero-size array to reduction operation maximum which has no identity

This doesn't appear to be an issue with the previous version available for download. Both have identical main dependency versions:

python 3.6.13 numpy 1.19.5 pytorch 1.9.1 tqdm 4.62.3

Thanks in advance, Darcy

fteufel commented 2 years ago

Hi Darcy, thanks a lot for raising this!

Turns out there was an issue in the conflict resolving function when processing Sec/SPII and Tat/SPII lipoproteins. I added logic to handle those as a separate case, using the predicted modified cysteine after the cleavage site to impute it when it's missing.


The online version is patched, I'll close the issue once the updated downloads go live.

darcyabjones commented 2 years ago

Cool, thanks!