JacksonBurns / mordred-community

Community-Maintained Version of mordred
https://jacksonburns.github.io/mordred-community/
BSD 3-Clause "New" or "Revised" License
44 stars 3 forks source link

Fix MDE overflow bug by computing the log of dx and then exponentiating #16

Closed jacob-osmo closed 3 months ago

jacob-osmo commented 3 months ago

I see the line here, namely dx = prod(Dv, dtype=longdouble) ** (1.0 / (2.0 * n)), was a response to an overflow error which led some Mordred descriptors to be silently equal to zero.

I still got an overflow error (and therefore silently equal to 0 descriptors) because of the prod for large enough SMILES strings such as CCCCCC=CCCC(CCCCCCCC(=O)O)C(CCCCCCCC)CCCCCCCCC(=O)O

This PR is a bugfix for this overflow error by computing log_dx first and then exponentiating to get dx:

log_dx = np.sum(np.log(Dv)) / (2.0 * n)
dx = np.exp(log_dx)

I think this change resolves the overflow issue.

JacksonBurns commented 3 months ago

Hi @jacob-osmo thanks for the clever suggestion! The CI looks good, I will go ahead and merge this and then do a new release.

Much appreciated!