ccdmb / predector

Effector prediction pipeline based on protein properties.
Apache License 2.0
11 stars 7 forks source link

BUG: SignalP3-NN fails for long sequences. #30

Closed darcyabjones closed 4 years ago

darcyabjones commented 4 years ago

The Signalp v3 nn model will raise an error if an input sequence is too long. It's unclear what the threshold is. Maybe 5000AA -ish?

Currently we just ignore the error and live with it as missing data. The more correct approach might be to truncate the proteins first, and run it through like that.

I don't think it's critical ATM. For future perfectionism.

darcyabjones commented 4 years ago

Turns out this was easy to fix.

Signalp3-nn dies for sequences over 6000AAs. We now truncate long sequences before running.