Command line recipes for the working chemoinformatician
obabel INFILE -O OUTFILE --unique
obabel INFILE -O OUTFILE --partialcharge mmff94
obabel in.smi -O out.mol2 --partialcharge mmff94 -p 7.4
cxcalc -g majortautomer -H 7.4 -f sdf input.smiles > output_taut74.sdf
mayachemtools/bin/MACCSKeysFingerprints.pl --size 166 [INFILE] --CompoundIDMode MolName
corina [-d wh] < INPUT.sdf > OUTPUT.sdf
cxcalc conformers in.smi -m 1 > out.sdf
omega -in in.smi -out out.sdf -maxconfs 1
fconv -rmsd current.mol2 --s=reference.mol2
stripper --in molecules.smi --out scaffolds.txt
obabel molecule.smi -O molecule.svg
inkscape molecule.svg -E molecule.eps --export-ignore-filters --export-ps-level=3
# librsvg2-bin provides rsvg-convert
# texlive-extra-utils provides pdfcrop
# ghostscript provides pdf2ps
# ps2eps provides ps2eps
function svg2eps () {
tmp_pdf_out=`echo $1 | sed 's/\.svg$/\_tmp.pdf/g'`
pdf_out=`echo $1 | sed 's/\.svg$/\.pdf/g'`
ps_out=`echo $1 | sed 's/\.svg$/\.ps/g'`
eps_out=`echo $1 | sed 's/\.svg$/\.eps/g'`
svg=$1
rsvg-convert -f pdf $svg -o $tmp_pdf_out
pdfcrop $tmp_pdf_out $pdf_out
pdf2ps $pdf_out $ps_out
ps2eps < $ps_out > $eps_out
}
# openbabel provides obabel
function smi2eps () {
smi=$1
svg_out=`echo $1 | sed 's/\.smi$/\.svg/g'`
obabel $smi -O $svg_out -xC -xd
svg2eps $svg_out
}
wget https://github.com/openbabel/openbabel/archive/openbabel-2-4-1.tar.gz
tar xzf openbabel-2-4-1.tar.gz
cd openbabel-openbabel-2-4-1/
mkdir build
cd build
cat <<EOF > build.sh
mkdir -p ~/usr
cmake -DPYTHON_BINDINGS=true -DCMAKE_INSTALL_PREFIX:PATH=$HOME/usr ../
EOF
chmod 755 build.sh
./build.sh
make -j4
make install
brew tap rdkit/rdkit
brew install rdkit --with-python3 --with-inchi
If this does not work, try the conda way (but then usage will need to be in a conda environment):
wget -c https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
sh Miniconda3-latest-MacOSX-x86_64.sh -p ~/usr/miniconda3
~/usr/miniconda3/bin/conda install -q -y -c conda-forge rdkit
Now you should check that you can really use it from Python:
python3
import rdkit
from rdkit import Chem
m = Chem.MolFromSmiles('n1ccccc1')
Store this in a 'molcount' script, somewhere on your PATH.
#!/bin/bash
for f in "$@"; do
filename=`basename "$f"`
extension="${filename##*.}"
case "$extension" in
mol2) egrep -c MOLECULE $f
;;
plr) egrep -c '^END$' $f # position and contrib per atom to cLogP
;;
pqr) egrep -c ^COMPND $f
;;
sdf) grep -c '$$$$' $f
;;
mol) grep -c '$$$$' $f
;;
phar) grep -c '$$$$' $f # Pharao DB
;;
smi) cat $f | wc -l
;;
*) echo "molcount: unsupported file format: ."$f
;;
esac
done
Works even with a "database" file with millions of molecules.
lbvs_consent_mol_get from https://github.com/UnixJunkie/consent
lbvs_consent_mol_get -i molecules.{sdf|mol2|smi} {-names "mol1,mol2,..."|-f names_file}
Some kind of canonicalization of molecular representations, consisting in the pair:
Sayle_hash(m) = (Canonical_smile_forcing_only_single_bonds_and_noH(m), number_of_Hydrogens_on_non_carbons(m) - sum_of_formal_charges(m))
m being the molecule to hash.
No GPU support, but at least its an automatic and simple install procedure. Deepchem's version is fixed to a version that works for what I currently do.
pip3 install joblib pandas sklearn tensorflow pillow simdna deepchem==2.1.1.dev353
pip3 install chemo-standardizer
opam install pardi
#!/bin/bash
if [ $# -lt 1 ]; then
echo "usage: "$0" input.smi output_std.smi"
exit 1
fi
INPUT=$1
OUTPUT=$2
pardi -i $INPUT -o $OUTPUT -c 400 -d l -ie '.smi' -oe '.smi' \
-w 'standardiser -i %IN -o %OUT 2>/dev/null'