alex-l-m / cod2mol

Download organometallic crystal structures from the Crystallography Open Database by doi, and extract molecules from the crystal as mol files. Also downloads from the Cambridge Structural Database if their proprietary Python API is available
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Given paper doi's, downloads structures from the Crystallography Open Database. Will also download from the Cambridge Structure Database, if the user has a licensed installation of their Python API. Individual molecules from each structure are outputted as .mol files. A single molecule is operationally defined as a connected component of the bonding graph, using the bonding information in the .cif file from the database.

doi's are read from stdin.

Downloading molecules from one paper:

echo "[doi]" | python cod2mol.py

Downloading molecules from multiple papers, using a file containing each doi on a separate row:

python cod2mol.py < [name of file]

The doi will be extracted from each row, and rows without a doi will be skipped, so it's okay to have, for example, blank lines, or doi.org urls.

Information on the downloaded molecules is included in "output_table.csv", which has columns "doi", "database", "entry", "molecule", and "filename". If "output_table.csv" already exists in the working directory, it will be appended to, and any doi already in it will not be redownloaded.