colabfit / data-lake

A repository to request ingestion of datasets to ColabFit
0 stars 1 forks source link

Dataset request: feature_comparison_NJP2023 #11

Open jvita opened 1 year ago

jvita commented 1 year ago

Contribute content


Contact information about the person contributing/requesting the data. Used for communication purposes. ​

Name: Josh Vita Email:


Any information necessary to help the ColabFit find and access the data, and to correctly cite relevant material. The "name" and "description" will be used when publishing to the ColabFit exchange, and should be human-readable. Author list should include full first names, unless the author is normally attributed by initials. Links should include relevant publications and online location of dataset, if available.

Name: feature_comparison_NJP2023

Authors: Ting Han, Jie Li, Liping Liu, Fengyu Li, Lin-Wang Wang




Details regarding how the data was computed in order to improve reproducibility. Provide as much information as possible. Input files are highly encouraged. Additional details might include functional, basis set, energy cutoff, k-point grid, reference energy, etc. ​

Method: DFT Software: Unknown Additional details: Unknown Files: Unknown

Included properties

See the current list of ColabFit property definitions. If you believe your data does not match one of the existing definitions, then you must submit a new property definition following the template provided in the examples folder.

Name Units Notes


Basic information explaining the types of configurations in the dataset, and how they are organized.
Elements should be listed by chemical symbol

Elements: C, S Number of configurations: 4000? Storage format: Unknown

Naming convention

If your configurations have names, please describe where their names can be found (e.g., as a field in an dictionary). ​

Configuration sets

Configuration sets are used to define a conceptual grouping over a collection of atomic configurations. Configuration sets are constructed via regex filtering on specified keys. ​

Paper contents suggest that configuration sets have been defined. However, this can't be verified without access to the data.

Key Regex Description

Configuration labels

Configuration labels can be attached to your data to improve interpretability. This is done via regex matching on specified keys. ​

Paper contents suggest that configuration labels may exist. However, this can't be verified without access to the data.

Key Regex Label

Distribution License

The license under which the content will be distributed (e.g. Creative Commons Zero)


jvita commented 1 year ago

More notes that I took:

gpwolfe commented 10 months ago

contacted corresponding author

gpwolfe commented 1 month ago

data link appears to be repaired. base link: DFT dataset(s) link: Parser may be adapted from

A number of the splits of the dataset (split by element type and by number of images in the trajectory) are only available via download through baidu netdisk client. Others are prepared for ingest