Closed darcyabjones closed 2 years ago
Hi,
Please do keep bothering me! This helps a lot.
Turns out this is a new bug I introduced myself with the 6.0e update (which was supposed to catch all edge cases). I'll update my CI to do the test runs also with the eukarya
mode enabled from now on...
Anyway, I should have an updated version online tomorrow. If it's urgent, you could add the following quick fix at line 237 of predict.py
if args.organism == 'eukarya':
global_probs[:,1] = global_probs[:,1:].sum(axis=1)
global_probs[:,2:] = 0
I'll look into this testing framework! So far we relied on running large reference proteomes to identify the edge cases.
Will close the issue once the updated downloads go live.
Hey again,
Sorry just in case you haven't found other issues yet, i've got another one that still fails with your patch to 6.0e. Same error.
>P00000D45
MYSRLFYLKSSYIIYFEPLFSNAIINILSFINSLASPLTIFCFALSAQALSTIFYFRIFI
FIFHSWILLFHFYFTCSFKTYEHQHSKMVPAYRMQSPRALPRTYLYVWPYK
Hi,
I am getting similar issues with the signalp6g. Any help would be great.
signalp6 -fasta ${!sample}.fasta -org euk -format txt -m slow-sequential --output_dir ${!sample}_signalP6
Predicting 6/6: 100%|██████████| 69500/69500 [3:50:56<00:00, 5.02sequences/s]
Traceback (most recent call last):
File "/home/.local/bin/signalp6", line 8, in
Thanks
Hi @B10inform , can you provide me with the fasta data for which this occurs? I'll look into it then.
Hi fteufel,
Here is the link to fasta file. https://solgenomics.net/ftp/genomes/Nicotiana_benthamiana/annotation/Niben101/Niben101_annotation.proteins.fasta.gz
Hi fteufel,
Were yo able to look into this issue??
Thanks
Hi, I could not reproduce your error. It must be related to your installation, I reinstalled from the download server and prediction finished without an error. I suspect it is
>ben101Scf02573g00010.1
MKAAAMSTPANAAPPMTALLAAFGGGVLSAVGCSAGEAPGPPAGVGAGGEPARPPAGAGD
GEVVEADGDGVGEVVGDGDGVAVGGDTAGAGTGVDGDGVGEVVGDGDGVAVGGDTAGAGT
GVGVAAGEILGAGAGD
that is causing the problem. It yields a malformed region prediction, but in the current version this only raises a warning and does not crash.
Hi there!
Sorry to bother you again. I'm still running into issues with the decoding step.
Running this sequence with SignalP 6.0e raises an error:
I haven't had a huge amount of time to debug it (or decipher how it all works), but it seems as though the marginal probabilities in
type_marginal_probs
are all assigning it to the PAD token, so you end up with a zero length array atnp.where(np.isin(marginal_region_preds, [5, 10, 19, 25, 31]))[0]
.I wonder if a property unit testing framework (like https://hypothesis.readthedocs.io/en/latest/) would be helpful for finding all of these edge cases and appropriately handle them? It seems to have become a troublesome issue.