colabfit / data-lake

A repository to request ingestion of datasets to ColabFit
https://colabfit.org/
0 stars 1 forks source link

Dataset request: AIMD-Chig #17

Closed jvita closed 8 months ago

jvita commented 11 months ago

Name

Josh Vita

Email

vita1@llnl.gov

Dataset name

AIMD-Chig

Authors

Tong Wang, Xinheng He, Mingyu Li, Bin Shao, Tie-Yan Liu

Links

Dataset description

This dataset covers the conformational space of Chignolin with DFT-level precision. We sequentially applied replica exchange molecular dynamics (REMD), conventional MD, and ab initio MD (AIMD) simulations on a 10 amino acid protein, Chignolin, and finally collected unparalleled 2 million biomolecule structures with quantum level energy and force records.

File details

Data repo includes README specifying folder contents/structure, which reports that the data is stored in XYZ format and is grouped by "anchor".

In total, looks to be ~15GB (zipped).

Method

DFT

Method (other)

No response

Software

ORCA

Software (other)

No response

Software version(s)

4.2.1

Additional details

M06-2X functional in conjunction with 6–31 G* basis set was employed for the calculation

Property types

Atomic forces, Potential energy

Other/additional property

No response

Property details

Elements

Chignolin

Number of Configurations

2,000,000

Naming convention

Names can likely be generated by //snapshot.xyz.

Configuration sets

No response

Configuration labels

No response

Distribution license

CC BY 4.0

Permissions

jvita commented 11 months ago

Some notes for improving the upload template:

gpwolfe commented 10 months ago

It looks like the subdirectories [0 - 99] in this case are just a way of dividing the 10K initial structures into manageable, 100-structure chunks. The actual divisions (by initial structure, or 'anchors', as defined in the text) are separated into the 10K individual files. Would that be an unreasonable number of configuration sets?

jvita commented 10 months ago

In that case, I'd say that it should be left as a single CS. Perhaps the anchors could be given as labels.

gpwolfe commented 10 months ago

Staged for ingest after next database update

gpwolfe commented 8 months ago

new database now live