gadsbyfly / PyBioMed

machine learning, molecular descriptor
http://pybiomed.readthedocs.io/en/latest/index.html
BSD 3-Clause "New" or "Revised" License
109 stars 61 forks source link

ZeroDivisionError when computing molecular descriptors #1

Open lorenzoFabbri opened 5 years ago

lorenzoFabbri commented 5 years ago

I'm trying to use PyBioMed to compute protein molecular descriptors. Everything works just fine except that for some sequences (like "RPDDEWY"), I encounter the following error:

ZeroDivisionError: integer division or modulo by zero.

Specifically, this happens when calling GetPAAC:

descriptors = Pyprotein.PyProtein(sequence)
descriptors.GetPAAC()

The last line of the Traceback points to _GetSequenceOrderCorrelationFactor but if I call just that function it works just fine. If I instead call either _GetPseudoAAC or _GetPseudoAAC1, I get the same error.

lorenzoFabbri commented 5 years ago

I see now that _GetSequenceOrderCorrelationFactor is called many times and that the sequence provided is too short.

At the moment, I changed the value of lamda inside all the functions (Pyprotein.py and PseudoAAC) and this solves the problem. I wonder though whether changing its value (to 5), will cause any other problem in the descriptors themselves. Thanks.

ifyoungnet commented 4 years ago

@lorenzoFabbri Thanks for your comments. lamda factor reflects the rank of correlation and is a non-Negative integer, such as 15. Note that (1)lamda should NOT be larger than the length of input protein sequence; (2) lamda must be non-Negative integer, such as 0, 1, 2, ...; (3) when lamda =0, the output of PseAA server is the 20-D amino acid composition.