Things that you should (and should not) do in your Materials Informatics research.
This is a repository containing the relevant Python code and Jupyter notebooks to the publication "Machine Learning for Materials Scientists: An Introductory Guide toward Best Practices".
These notebooks are included to illustrate a hypothetical Machine Learning project in materials science created following best practices. The goal of this project is to predict the heat capacity of materials given a chemical composition and condition (the measurement temperature).
To read the main publication for which these notebooks are made, please see:
Wang, Anthony Yu-Tung; Murdock, Ryan J.; Kauwe, Steven K.; Oliynyk, Anton O.; Gurlo, Aleksander; Brgoch, Jakoah; Persson, Kristin A.; Sparks, Taylor D., Machine Learning for Materials Scientists: An Introductory Guide toward Best Practices, Chemistry of Materials 2020, 32 (12): 4954–4965. DOI: 10.1021/acs.chemmater.0c01907.
Please cite the following work if you choose to adopt or adapt the methods mentioned in this Methods/Protocols article:
Wang, Anthony Yu-Tung; Murdock, Ryan J.; Kauwe, Steven K.; Oliynyk, Anton O.; Gurlo, Aleksander; Brgoch, Jakoah; Persson, Kristin A.; Sparks, Taylor D., Machine Learning for Materials Scientists: An Introductory Guide toward Best Practices, Chemistry of Materials 2020, 32 (12): 4954–4965. DOI: 10.1021/acs.chemmater.0c01907.
Citation in BibTeX format:
@article{Wang2020bestpractices,
author = {Wang, Anthony Yu-Tung and Murdock, Ryan J. and Kauwe, Steven K. and Oliynyk, Anton O. and Gurlo, Aleksander and Brgoch, Jakoah and Persson, Kristin A. and Sparks, Taylor D.},
year = {2020},
title = {Machine Learning for Materials Scientists: An Introductory Guide toward Best Practices},
url = {https://doi.org/10.1021/acs.chemmater.0c01907},
pages = {4954--4965},
volume = {32},
number = {12},
issn = {0897-4756},
journal = {Chemistry of Materials},
doi = {10.1021/acs.chemmater.0c01907}
}
This repositories hosts a series of Jupyter notebooks, which run on the Python programming language. Please follow the below steps to get started with using these notebooks.
Do one of the following:
conda-env.yml
file:
conda env create --file conda-env.yml
conda activate bestpractices
(bestpractices
is the name of the environment)For more information about creating, managing, and working with Conda environments, please consult the relevant help page.
pip
:Open conda-env.yml
and pip install
all of the packages listed there.
We recommend that you create a separate Python environment for this project.
We will be using Jupyter notebooks to demonstrate some of the concepts and workflows described in the paper.
Jupyter notebooks give you an interactive development environment, and shows you your code, your code outputs (e.g. calculation results, processed data, visualizations) as well as other rich media (such as text, HTML, images, equations, even LaTeX!) together in a notebook-style environment. Jupyter notebooks are commonly used in the machine learning field.
You should have installed the packages required by Jupyter notebooks already if you followed the steps above to create the bestpractices
environment.
In that case, you can start Jupyter by following these steps:
jupyter notebook
http://localhost:8888/?token=<token>
(or something similar)notebooks
directory and click on a Jupyter notebook to start the notebook
Jupyter notebooks are composed of several types of "cells", the most import types being code cells ("Code") and text cells ("Markdown"). You can edit code cells by clicking inside the cell and editing the code. To edit text cells, double click inside the cell and then edit the text. You can use Markdown-style formatting.
You can navigate a Jupyter notebook by using your mouse or your keyboard arrow keys.
Some other handy keyboard shortcuts to know:
Keyboard shortcut | Description |
---|---|
Ctrl + Enter |
Run the contents of a cell |
Shift + Enter |
Run the contents of a cell, and then advance to the next cell |
Ctrl + S |
Save |
For more information about how to use Jupyter notebooks, you can consult the official Jupyter Notebook documentation as well as the wealth of available information online.
A julia implementation can be found in the folder pluto_notebooks. Additional instructions for setup are provided in the README file there. In general much of the same workflow has been kept in place the major difference is the use of julia equivalent packages (e.g., DataFrames.jl for Pandas).