Closed valentynbez closed 1 year ago
I tried to replicate it with randomly generated sequences, but now it always fails. Is this an alphabet issue?
import pyopal
from random import choice
alphabet = 'ACDEFGHIKLMNPQRSTVWY'
# generate proteins with length from 1k to 36k in steps of 1k
proteins = ["".join([choice(alphabet) for _ in range(i)]) for i in range(1000, 36000, 1000)]
database = pyopal.Database(proteins)
for sequence in proteins:
try:
results = database.search(sequence, mode="score", algorithm="nw")
except RuntimeError:
print("Failure", len(sequence))
I'll have a look when I have more time, I'm also getting failures with the random sequences.
Overflow detection was turned off for 32-bit integer scoring. This only affected the non-SW alignment modes, as SW uses a different function where overflow worked as expected. I updated the code in my fork of Opal.
Fixed in v0.4.1
.
I align some extremely long (>20,000 aa) proteins against AF2DB. Opal fails to do that at ~30,000 aa with
RuntimeError: Failed to run search Opal database (code=1)
for 2 longest sequences in the dataset, both inscore
andfull
mode.Code to reproduce:
data.zip