this PR introduces a a major database feature and a few QOL/code maintenance features
new cipher feature: ASKCOS and retrosynthesis-based triggers
There is additional work to do here, but the core of the logic has been laid out:
added a new utility class to interact with ASKCOS web API: AskcosClient
added a utility class to calculate SA scores SAScorer
added document schemas for both retrosynthetic trees (Retrosynthesis) and retrosynthetic difficulties (Difficulty)
added functions to create and save the corresponding documents for an input SMILES string
added unit tests for the SAScorer
Some things that remain:
I'm not certain the trigger code itself is fully complete, but this should be a relatively simple fix as the core logic has already been implemented
a retrosynthetic tree is a tree consisting of ChemicalNodes and ReactionNodes, rooted in a ChemicalNode. A ChemicalNode may be terminal or have a ReactionNode stemming from it. A ReactionNode itself then maps to 1+ ChemicalNodes. It's natural to represent a SyntheticTree using the EmbeddedDocumentField of mongoengine, but the circular depency of ChemicalNodes and ReactionNodes has forced the use of a GenericEmbeddedDocumentField in the ReactionNode schema. Clearly, a ReactionNode always has ChemicalNode children, but I don't know how to enforce this in the schema definition without leading to a "compilation" error
An AskcosClient can be set with default tree parameters or accept a specific set in its get_trees() function. Currently, the trigger will be set up with module-level AskcosClient that reads in the default parameters from a configuration file on the VM filesystem. This isn't ideal because it can be a pain in the butt to get filepaths to work robustly across deployments, but we also don't want to potentially set 20+ parameters via environment variables
unit tests for tree building
QOL/maintenance features:
added flake8, black, and pre-commit configs to enforce linting (at the very least) and consistent code style (via black or some other formatter)
added setup.cfg so developers can more quickly set up their environment to start contributing. This will need more work though
Remaining thoughts:
We'll probably need to reorganize the repo to more clearly separate out the data (e.g., our schema definitions,) from the logic (i.e., the code that creates our documents and interacts with the DB itself). This way we can easily deploy separate elements of the DB without having them all be co-dependent. E.g., if I want to deploy the ASKCOS trigger on a VM, I don't also need the dependencies for CANDO deployment.
unit tests should be a higher priority for new additions to the repo (even just some simple ones)
we should start using PRs instead of everyone commiting to master
this PR introduces a a major database feature and a few QOL/code maintenance features
new cipher feature: ASKCOS and retrosynthesis-based triggers
There is additional work to do here, but the core of the logic has been laid out:
AskcosClient
SAScorer
Retrosynthesis
) and retrosynthetic difficulties (Difficulty
)SAScorer
Some things that remain:
ChemicalNode
s andReactionNode
s, rooted in aChemicalNode
. AChemicalNode
may be terminal or have aReactionNode
stemming from it. AReactionNode
itself then maps to 1+ChemicalNode
s. It's natural to represent aSyntheticTree
using theEmbeddedDocumentField
of mongoengine, but the circular depency ofChemicalNode
s andReactionNode
s has forced the use of a GenericEmbeddedDocumentField in theReactionNode
schema. Clearly, aReactionNode
always hasChemicalNode
children, but I don't know how to enforce this in the schema definition without leading to a "compilation" errorAskcosClient
can be set with default tree parameters or accept a specific set in itsget_trees()
function. Currently, the trigger will be set up with module-levelAskcosClient
that reads in the default parameters from a configuration file on the VM filesystem. This isn't ideal because it can be a pain in the butt to get filepaths to work robustly across deployments, but we also don't want to potentially set 20+ parameters via environment variablesQOL/maintenance features:
flake8
,black
, andpre-commit
configs to enforce linting (at the very least) and consistent code style (via black or some other formatter)setup.cfg
so developers can more quickly set up their environment to start contributing. This will need more work thoughRemaining thoughts:
commit
ing tomaster