colabfit / data-lake

A repository to request ingestion of datasets to ColabFit
0 stars 1 forks source link

Dataset request: xxMD-CASSCF and xxMD-DFT #12

Closed jvita closed 8 months ago

jvita commented 1 year ago

Contribute content


Contact information about the person contributing/requesting the data. Used for communication purposes. ​

Name: Josh Vita Email:


Any information necessary to help the ColabFit find and access the data, and to correctly cite relevant material. The "name" and "description" will be used when publishing to the ColabFit exchange, and should be human-readable. Author list should include full first names, unless the author is normally attributed by initials. Links should include relevant publications and online location of dataset, if available.


Authors: Zihan Pengmei, Junyu Liu, Yinan Shu


The xxMD (Extended Excited-state Molecular Dynamics) dataset is a comprehensive collection of non-adiabatic trajectories encompassing several photo-sensitive molecules. This dataset challenges existing Neural Force Field (NFF) models with broader nuclear configuration spaces that span reactant, transition state, product, and conical intersection regions, making it more chemically representative than its contemporaries.


Details regarding how the data was computed in order to improve reproducibility. Provide as much information as possible. Input files are highly encouraged. Additional details might include functional, basis set, energy cutoff, k-point grid, reference energy, etc. ​

Method: DFT, CASSCF Software: Unknown Additional details: M06 XC-functional Files: None

Included properties

See the current list of ColabFit property definitions. If you believe your data does not match one of the existing definitions, then you must submit a new property definition following the template provided in the examples folder.

Note: it appears that they have computed the energies/forces for all of the configurations at two levels of theory.

Name Units Notes
potential-energy Unknown
atomic-forces Unknown


Basic information explaining the types of configurations in the dataset, and how they are organized.
Elements should be listed by chemical symbol

Elements: C, H, O, N Number of configurations: lots Storage format: ASE?

Naming convention

If your configurations have names, please describe where their names can be found (e.g., as a field in an dictionary). ​

Names can be generated by assigning indices to the configurations, prepended with their full path. For example: xxMD-DFT/sti/0. ​

Configuration sets

Configuration sets are used to define a conceptual grouping over a collection of atomic configurations. Configuration sets are constructed via regex filtering on specified keys. ​

Data appears to be grouped into configuration sets, but I'm not yet sure what these correspond to.

Key Regex Description

Configuration labels

Configuration labels can be attached to your data to improve interpretability. This is done via regex matching on specified keys. ​

Key Regex Label

Distribution License

The license under which the content will be distributed (e.g. Creative Commons Zero)

Creative Commons Attribution 4.0 International

gpwolfe commented 10 months ago

staged for ingest following next database update

gpwolfe commented 8 months ago

new database now live