asapdiscovery / asapdiscovery

Toolkit for open antiviral drug discovery by the ASAP Discovery Consortium
https://asapdiscovery.org
MIT License
30 stars 1 forks source link

Very filenames and long titles in SDFs can cause issues that jam up workflow components #329

Closed hmacdope closed 8 months ago

hmacdope commented 1 year ago

Beware very long (> 80 char) file names and titles in SDFs, these can cause issues with components including RDKit and openeye parsing including segfaults.

hmacdope commented 1 year ago

This is fixed by a function that truncates file name lengths aspadiscovery.data.utils.check_name_length_and_truncate in #246

JenkeScheen commented 1 year ago

this highlights the need for us to start using some form of UUID for compounds. At least for the FECs workflow we're still using InChi for compound names, these can exceed >80 chars and are thus truncated, but this risks creating duped names between compounds.

We should probably start using the low-level compound identifier that the postera API uses (it's a random string so should be fine to use)

hmacdope commented 1 year ago

Reopening as a discussion point.

hmacdope commented 8 months ago

We do this now