Closed jvita closed 8 months ago
Some notes for improving the upload template:
It looks like the subdirectories [0 - 99] in this case are just a way of dividing the 10K initial structures into manageable, 100-structure chunks. The actual divisions (by initial structure, or 'anchors', as defined in the text) are separated into the 10K individual files. Would that be an unreasonable number of configuration sets?
In that case, I'd say that it should be left as a single CS. Perhaps the anchors could be given as labels.
Staged for ingest after next database update
new database now live
Name
Josh Vita
Email
vita1@llnl.gov
Dataset name
AIMD-Chig
Authors
Tong Wang, Xinheng He, Mingyu Li, Bin Shao, Tie-Yan Liu
Links
Dataset description
This dataset covers the conformational space of Chignolin with DFT-level precision. We sequentially applied replica exchange molecular dynamics (REMD), conventional MD, and ab initio MD (AIMD) simulations on a 10 amino acid protein, Chignolin, and finally collected unparalleled 2 million biomolecule structures with quantum level energy and force records.
File details
Data repo includes README specifying folder contents/structure, which reports that the data is stored in XYZ format and is grouped by "anchor".
In total, looks to be ~15GB (zipped).
Method
DFT
Method (other)
No response
Software
ORCA
Software (other)
No response
Software version(s)
4.2.1
Additional details
M06-2X functional in conjunction with 6–31 G* basis set was employed for the calculation
Property types
Atomic forces, Potential energy
Other/additional property
No response
Property details
Elements
Chignolin
Number of Configurations
2,000,000
Naming convention
Names can likely be generated by//snapshot.xyz.
Configuration sets
No response
Configuration labels
No response
Distribution license
CC BY 4.0
Permissions