I've writen a better version of Free-Wilson analysis

I wrote this version of Free-Wilson in 2018. Since then, many of the libraries I used have improved. Hopefully, my coding skills have also improved. I'd recommend using the new version and not this one. Please check out the Free-Wilson notebook in my Practical Cheminformatics Tutorials repo.

Free Wilson Analysis

This code provides a Python implementation of the Free-Wilson SAR analysis method as described in

Spencer M. Free and James W. Wilson
A mathematical contribution to structure-activity studies
Journal of Medicinal Chemistry 7.4 (1964): 395-399.


  1. This software requires an anaconda environment with the RDKit (at least 2018.3) installed.
    The code uses a lot type hints so you'll probably need at least Python 3.6.

  2. Install dependencies

    pip install docopt tqdm pyfancy sklearn scipy joblib


A detailed tutorial introduction can be found at

As a demo, go to the data directory and run the command below

../ all --scaffold scaffold.mol --in fw_mols.smi --act fw_act.csv --prefix test

Quite a few other options are also available.

Usage: all --scaffold SCAFFOLD_MOLFILE --in INPUT_SMILES_FILE --prefix JOB_PREFIX --act ACTIVITY_FILE [--smarts R_GROUP_SMARTS] [--max MAX_SPEC] [--log] rgroup --scaffold SCAFFOLD_MOLFILE --in INPUT_SMILES_FILE --prefix JOB_PREFIX [--smarts R_GROUP_SMARTS] regression --desc DESCRIPTOR_FILE --act ACTIVITY_FILE --prefix JOB_PREFIX [--log] enumeration --scaffold SCAFFOLD_MOLFILE --model MODEL_FILE --prefix JOB_PREFIX [--max MAX_SPEC]

--scaffold SCAFFOLD_MOLFILE molfile with labeled R-groups
--prefix JOB_PREFIX job prefix
--act ACTIVITY_FILE activity column should be labeled "Act" in the header
--model MODEL_FILE names of the model file created by the "regression" command
--smarts R_GROUP_SMARTS SMARTS pattern to restrict R-group when the scaffold is symmetric
--max MAX_SPEC maximum number of R-groups to enumerate specified as a string for R1,R2,R3 e.g. "a|2,5,8"