Official implementation of ChargeE3Net, introduced in Higher-Order Equivariant Neural Networks for Charge Density Prediction in Materials.
@article{koker2024higher,
title={Higher-order equivariant neural networks for charge density prediction in materials},
author={Koker, Teddy and Quigley, Keegan and Taw, Eric and Tibbetts, Kevin and Li, Lin},
journal={npj Computational Materials},
volume={10},
number={1},
pages={161},
year={2024},
publisher={Nature Publishing Group UK London}
}
Install dependencies in environment:
pip install -r requirements.txt
The src/train_from_config.py
and src/test_from_config.py
scripts can be run
on multiple nodes and GPUs with the --multinode
or -m
flag. Our configs
assume a configuration of 2 nodes, each with 2 GPUs. You will likely need to
adjust the Slurm Launcher Parameters
in the hydra.launcher
section of the config to work with your system.
Alternatively you can run the commands on a single gpu system without the -m
flag:
python src/<train,test>_from_config -cd configs/charge3net -cn <config name> nnodes=1 nprocs=<number of GPUs to use>
However, training models with a different number of nodes/GPUs may result in different performance due to change in effective batch size.
Given a directory containing CHGCAR files (do not need to be at top level in directory), a set of ChargE3Net input files can be created with the following script:
python scripts/convert_chgcar_dir_to_pkl_dir.py --input <directory containing CHGCARs> --output <new directory> [--workers WORKERS]
NOTE: If there are more than 10 CHGCARs in the input directory, it is recommended to add additional workers. 5 workers is a good choice to start with.
This will add all necessary input files to the new directory specified by --output
, with the following tree
/path/to/charge3net_inputs/
├── filelist.txt
├── chgcar1.npy
├── chgcar1_atoms.pkl
├── chgcar2.npy
├── chgcar2_atoms.pkl
├── path_to_chgcar3.npy
├── path_to_chgcar3_atoms.pkl
├── probe_counts.csv
└── split.json
Now, the ChargE3Net model can be used to compute charge density, with this directory as input:
python src/test_from_config.py -cd configs/charge3net/ -cn test_chgcar_inputs.yaml input_dir=</path/to/charge3net_inputs/> -m
The model predictions will show up in the /path/to/charge3net_inputs/
directory
Download datasets. See Datasets for instructions for each dataset.
Materials Project:
python src/test_from_config.py -cd configs/charge3net -cn train_mp_e3_final.yaml checkpoint_path=models/charge3net_mp.pt -m
QM9:
python src/test_from_config.py -cd configs/charge3net -cn train_qm9_e3_final.yaml checkpoint_path=models/charge3net_qm9.pt -m
NMC:
python src/test_from_config.py -cd configs/charge3net -cn train_nmc_e3_final.yaml checkpoint_path=models/charge3net_nmc.pt -m
Download datasets. See Datasets for instructions for each dataset.
Materials Project:
python src/train_from_config.py -cd configs/charge3net -cn train_mp_e3_final.yaml -m
QM9:
python src/train_from_config.py -cd configs/charge3net -cn train_qm9_e3_final.yaml -m
NMC:
python src/train_from_config.py -cd configs/charge3net -cn train_nmc_e3_final.yaml -m
python download/download_materials_project.py \
--out_path ./data/mp_raw \
--workers <number of workers> \
--task_id_file ./data/mpid_to_task_id_map.json \
--mp_api_key <MP API key>
Update 2024: The Materials Project API has updated their backend, and as a
result some of the old task ids are no longer accessable. We have added a flag
to the above command --download_latest_for_missing_task_id
, which will download the latest calculations for these missing task ids. Note that this will result in data that is slightly different from our published work.
Optionally, you can exclude the task_id_file
to download the latest data from materials project, including any updates to the dataset since we obtained a copy. For reproducible results, use the above call.
python download/download_materials_project.py \
--out_path ./data/mp_raw \
--workers WORKERS \
--mp_api_key <MP API key>
python scripts/batch_pickle_mp_charge_density.py --raw_data_dir ./data/mp_raw --pkl_data_dir ./data/mp/
These files are provided, but you can optionally reproduce them with the following scripts:
NOTE: The files produced by the scripts below may not yield exactly reproducible results, given changes to the materials project dataset. It is recommended to use the files provided in the data/ directory.
ls ./data/mp_raw -1 > ./data/mp/filelist.txt
python scripts/write_mp_probe_count_file.py --filelist ./data/mp_raw/filelist.txt --workers WORKERS
Download from Jørgensen and Bhowmik
into data/qm9
.
Create the filelist for training:
cd data/qm9
ls -1 *.tar > qm9vasp.txt
The data split we used in our work is provided under data/qm9/datasplits.json
Download from Jørgensen and Bhowmik
into data/nmc
.
Create the filelist for training:
cd data/nmc
ls -1 *.tar > cellrelax.txt
The data split files will be downloaded with the dataset.
Structures obtained from google-deepmind/materials_discovery. See data/gnome/filelist.txt
for the list of file IDs studied in our work.
A checkpoint can be tested with the following syntax:
python src/test_from_config.py -cd configs/charge3net/ -cn train_mp.yaml checkpoint_path=<checkpoint_path> <-m> <relevant overrides>
NOTE: The outputs saved from this method will not show up in the same directory tree as your
checkpoint_path
. They will appear in thehydra.job.name
directory from configs/charge3net/train_mp.yaml. To fix this, you can override a location to save outputs within thecheckpoint_path
tree, like so:trainer.logger.save_dir=<some output dir>
In this case <relevant overrides>
is considered to be anything that changes the model parameters, or other parameters related to testing the model. The model overrides need to match the config elements that were used to create the checkpoint at checkpoint_path
. Relevant parameters might include:
model.model.num_interactions
: typically 3 for PaiNN and 6 for Schnetdata.test_probes
: 1000 for a quick test, null
for all probes (full grid)cube_dir
: directory to output density cube predictions (output as .npy arrays)An alternative way to test the model is to pass the output config from a training run directly.
python src/test_from_config.py -cd results/charge3net/mp/2023-05-01/12-49-14/0/.hydra/ -cn config trainer.num_nodes=1 trainer.devices=1 data.test_probes=1000 trainer.logger.save_dir=results/charge3net/mp/2023-05-01/12-49-14/0/test_0 -m
However, if running in a distributed testing fashion using slurm this method will NOT work. Paths and launcher info are located in the output hydra.yaml file, and inaccessible from the output config.yaml.
To bypass these issues, you need to specify relevant paths and info as overrides, as so:
python src/test_from_config.py -cd results/charge3net/mp/2023-05-01/12-49-14/0/.hydra/ -cn config.yaml hydra/launcher=submitit_slurm hydra.job.name=mp hydra.run.dir=results/charge3net/mp/2023-05-01/12-49-14/0/ hydra.sweep.dir='${hydra.run.dir}' hydra.launcher.partition=gaia hydra.sweep.subdir=test hydra.launcher.nodes='${trainer.num_nodes}' hydra.launcher.tasks_per_node='${trainer.devices}' hydra.launcher.constraint=xeon-g6 +hydra.launcher.additional_parameters.gres=gpu:volta:2 trainer.num_nodes=1 trainer.devices=2 data.test_probes=1000 -m
This method can be cumbersome, but if its difficult to reference the original config and you need to run distributed testing/inference, this will work.
DISTRIBUTION STATEMENT A. Approved for public release. Distribution is unlimited.
This material is based upon work supported by the Under Secretary of Defense for Research and Engineering under Air Force Contract No. FA8702-15-D-0001. Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Under Secretary of Defense for Research and Engineering.
© 2023 Massachusetts Institute of Technology.
Subject to FAR52.227-11 Patent Rights - Ownership by the contractor (May 2014)
The software/firmware is provided to you on an As-Is basis
Delivered to the U.S. Government with Unlimited Rights, as defined in DFARS Part 252.227-7013 or 7014 (Feb 2014). Notwithstanding any copyright notice, U.S. Government rights in this work are defined by DFARS 252.227-7013 or DFARS 252.227-7014 as detailed above. Use of this work other than as specifically authorized by the U.S. Government may violate any copyrights that exist in this work.