Authors: Ian Pendleton, Michael Tynes, Aaron Dharna
Science Contact: jschrier .at. fordham.edu, ian .at. pendletonian.com
Technical Debugging: vshekar .at. haverford.edu, gcattabrig .at. haverford.edu,
Retrieves experiment files from supported locations and processes to an intermediary JSON file on users local machine. The generated JSON files are used to generate a 2d CSV of the data in a format
compatible with most machine learning software (e.g. SciKit learn). Additional configuration is required to map the existing
data structures to headers which resemble the users desired configuration. These mappings are typically trivial for computer
scientists, but may be more challenging for non-domain experts or individuals unfamiliar with manipulating dataframes. The
dataset is augmented with chemical calculations such as concentrations, temperatures derived from models of plate temperature,
and other empirical observations. In the final steps the dataset is supplemented with chemical features and calcs derived from ChemAxon, RDKit, and local datasets saved to this repository. Additional information on how to control the generation of _feat_
and _calc_
columns can be found in the user documentation here.
The original ESCALATE publication can be found here.
User documents, relating to a complete cycle of escalate, can be found here.
This build process has been tested on MacOS High Sierra (10.13.5), MacOS Catalina (10.15.3), Ubuntu Bionic Beaver (18.04), and Windows 10 (version 1909 OS Build 18363.418)
Windows Users: Please note that while windows has been tested it is not the recommended Operating System. Everything is more challenging, the installation is messier, logging is limited, and the file system interaction is more brittle.
Create new python 3.8 environment in conda and activate:
conda create -n escalate_report python=3.8
conda activate escalate_report
Install the latest version of the pip package manager
conda install pip
Then install requirments (still in escalate_report)
pip install -r requirements.txt
Then install conda dependent pieces:
conda install -c conda-forge rdkit
Execute:
conda update conda
conda env create -f environment.yml
The conda env create
command will automatically create an escalate_report environment
Pip install the following python packages prior to use:
conda install -c conda-forge rdkit
Please report any failures of the above message to the repo admins
Download the securekey files and move them into the root folder (./
, aka. current working directory, aka. ESCALATE_report-master/
if downloaded from git). Do not distribute these keys! (Contact a dev for access)
Ensure that the files 'client_secrets.json' and 'creds.json' are both present in the root folder (./
, aka. current working directory, aka. ESCALATE_report-master/
if downloaded from git). The correct folder for these keys is the one which contains the runme.py script.
Stop here if you don't want to use ChemAxon for feature generation. Rdkit and the available ESCALATE features will still be generated.
type_command.csv
, these can be ignored if that is the desired functionalityDownload and install ChemAxon JChemSuite and obtain a ChemAxon License Free for academic use
Follow the installation instruction found on ChemAxons website Be sure to not the location of the JChemSuite installation (i.e. ~/opt/chemaxon/jchemsuite/bin
on linux or /Applications/JChemSuite/bin/
on MacOSX)
You will need to specify the location of your chemaxon installation locations in ./expworkup/devconfig.py
at the bottom of the file.
Currently supported google_drive_target_name
(user defined folder names):
A more detailed instruction manual including videos overviewing how to operated the code can be found in the ESCALATE user manual
Definitions
<my_local_folder>
: is the name of the folder where files should be created. _This will be automatically created by ESCALATEreport if it does not exist. The specified name will also be used as the final exported csv (i.e. if
<google_drive_target_name>
: one or more of the available datasets. see examples below
You can always get runtime information by executing:
python runme.py --help
To execute a normal run with chemaxon, rdkit, and ESCALATE calcs (see installation instructions above for more details)
python runme.py <my_local_folder> -d <google_drive_target_name>
To improve the clarity of column headers specify them in the dataset_rename.json
file. All columns can be viewed in the initial run by executing:
python runme.py <my_local_folder> -d <google_drive_target_name> --raw 1
Columns that do not conform to the _{category}_
(e.g., _feat_
, _rxn_
) will be omitted unless --raw 1
is enabled!
--raw 1
Significant flexibility is enabled for _feat_
(via, type_command.csv) and _calc_
(via, ./utils/calc_command.py) specification. For examples, discussion, and limitations of these specifications please see the USER docs.
_calc_
generation can be skipped by using the --disablecalcs True
flag on the CLI--offline 1
--offline 2
A file named <my_local_folder>.csv
will contain the 2d CSV of the dataset using the configured headers from the data or the mapping developed for the lab. The data/
folder will contain the generated JSONs.
Intermediate dataframes can be exported in bulk by specifying:
python runme.py <my_local_folder> -d <google_drive_target_name> --debug 1
To add additional target directories please see the how-to guide here. If you would like to add these to the existing datasets, please issue a git merge request after you add the necessary information.
More detailed instructions can be found in the ESCALATE user manual.
If you are using Windows10 please follow these instructions on what you will need to setup your environment. Consider using Ubuntu or wsl instead!
Ensure that versioned data repo and escalation are installed
Create an issue on versioned repo with new crank-number
python runmy.py <my_local_folder> -d <google_drive_target_name> -v <crank-number>
This will generate files for upload to versioned data repo with the names:
crank-number
>.<dataset-name
>.csvcrank-number
>.<dataset-name
>.index.csv Move these files to the /pathto/versioned-dataset/data/perovskite/<my_local_folder>
Follow Readme.md instructions for versioned=datasets
state-set
file with Crankpython runmy.py <my_local_folder> -d <google_drive_target_name> -v <crank-number> -s <state-set_file_name.csv>
Follow 5-6 above
python runme.py 4-Data-Iodides -d 4-Data-Iodides
python runme.py 4-Data-Iodides -d 4-Data-Iodides 4-Data-WF3_Iodide 4-Data-WF3_Alloying
python runme.py dev -d dev --debug 1 --raw 1 --offline 1
python runme.py perovskitedata -d 4-Data-Iodides --verdata 0111 --state example.csv