alex-l-m/cod2mol - Githubissues

Given paper doi's, downloads structures from the Crystallography Open Database. Will also download from the Cambridge Structure Database, if the user has a licensed installation of their Python API. Individual molecules from each structure are outputted as .mol files. A single molecule is operationally defined as a connected component of the bonding graph, using the bonding information in the .cif file from the database.

doi's are read from stdin.

Downloading molecules from one paper:

echo "[doi]" | python cod2mol.py

Downloading molecules from multiple papers, using a file containing each doi on a separate row:

python cod2mol.py < [name of file]

The doi will be extracted from each row, and rows without a doi will be skipped, so it's okay to have, for example, blank lines, or doi.org urls.

Information on the downloaded molecules is included in "output_table.csv", which has columns "doi", "database", "entry", "molecule", and "filename". If "output_table.csv" already exists in the working directory, it will be appended to, and any doi already in it will not be redownloaded.

alex-l-m / cod2mol

readme