The moleculeresolver was born out of the need to annotate large datasets with accurate structural information fast and to crosscheck whether given metadata (name, SMILES) agrees with each other. It also allows to efficiently compare whether structures are available in two large datasets.
In short it's a Python module that allows you to retrieve molecular structures from multiple chemical databases, perform crosschecks to ensure data reliability, and standardize the best representation of molecules. It also provides functions for comparing molecules and sets of molecules based on specific configurations. This makes it a useful tool for researchers, chemists, or anyone working in computational chemistry / cheminformatics who needs to ensure they are working with the best available data for a molecule. The tool
The package is available on pypi:
pip install molecule-resolver
To use Molecule Resolver, first import and initialize the MoleculeResolver
class. it is supposed to be used as a context manager:
from moleculeresolver import MoleculeResolver
with MoleculeResolver(available_service_API_keys={"chemeo": "YOUR_API_KEY"}) as mr:
...
Retrieve a molecule using both its common name and CAS number, then compare the two to ensure they represent the same structure:
from rdkit import Chem
from moleculeresolver import MoleculeResolver
with MoleculeResolver(available_service_API_keys={"chemeo": "YOUR_API_KEY"}) as mr:
molecule_name = mr.find_single_molecule(["aspirin"], ["name"])
molecule_cas = mr.find_single_molecule(["50-78-2"], ["cas"])
are_same = mr.are_equal(Chem.MolFromSmiles(molecule_name.SMILES),
Chem.MolFromSmiles(molecule_cas.SMILES))
print(f"Are the molecules the same? {are_same}")
Use the parallelized version to retrieve multiple molecules. If a large number of molecules is searched, moleculeresolver will try to use batch download capabilities whenever the database supports this.
import json
from moleculeresolver import MoleculeResolver
molecule_names = ["aspirin", "propanol", "ibuprofen", "non-exixtent-name"]
not_found_molecules = []
molecules_dicts = {}
with MoleculeResolver(available_service_API_keys={"chemeo": "YOUR_API_KEY"}) as mr:
molecules = mr.find_multiple_molecules_parallelized(molecule_names, [["name"]] * len(molecule_names))
for name, molecule in zip(molecule_names, molecules):
if molecule:
molecules_dicts[name] = molecule.to_dict(found_molecules='remove')
else:
not_found_molecules.append(name)
with open("molecules.json", "w") as json_file:
json.dump(molecules_dicts, json_file, indent=4)
print(f"Molecules not found: {not_found_molecules}")
The MoleculeResolver
class allows users to configure various options like:
Contributions are welcome! If you have suggestions for improving the Molecule Resolver or want to add new features, feel free to submit an issue or a pull request on GitHub.