Open mkhorton opened 6 years ago
Also worth mentioning schema.org, http://schema.org/docs/schemas.html and their extensions for specialized applications.
If the intent is to create a standard, schema.org might have some good inspiration, and might be worth getting the QC schema listed as an extension. They define a vocabulary, not a file format, though JSON-LD is supported.
There's also http://esl.cecam.org/ESCDF_-_Electronic_Structure_Common_Data_Format which is not JSON at all, but is a serious attempt to create standards for some of the larger files produced by electronic structure codes, which might also be good for inspiration for what fields to include.
Thank you for the very helpful links. I would like to link your representations on the main page, do you happen to have documentation links or examples of your Structure or Molecule JSON?
I didn't realize CECAM had the ESCDF project, I will have to reach out to them. Thanks for the tip!
Examples are easiest to obtain via the Material Project's API, but I'll include two below.
This is a Structure (NiO):
{'@class': 'Structure',
'@module': 'pymatgen.core.structure',
'lattice': {'a': 2.9752645085077005,
'alpha': 106.81598496229357,
'b': 5.142196203291477,
'beta': 120.00000472813846,
'c': 2.975264009520628,
'gamma': 73.1840097254608,
'matrix': [[0.85888503, 2.4292932, 1.48763233],
[-2.57109727, 0.00392925, 4.45327129],
[1.71776904, -2.4292932, 0.0]],
'volume': 37.15665600377399},
'sites': [{'abc': [0.0, 0.0, 0.0],
'label': 'Ni',
'properties': {'coordination_no': 6,
'forces': [0.0, 0.0, 0.0],
'magmom': 1.724},
'species': [{'element': 'Ni', 'occu': 1}],
'xyz': [0.0, 0.0, 0.0]},
{'abc': [0.5, 0.5, 0.5],
'label': 'Ni',
'properties': {'coordination_no': 6,
'forces': [0.0, 0.0, 0.0],
'magmom': -1.724},
'species': [{'element': 'Ni', 'occu': 1}],
'xyz': [0.0027784000000000697, 0.001964625000000053, 2.97045181]},
{'abc': [0.2499979, 0.2500063, 0.7500021],
'label': 'O',
'properties': {'coordination_no': 6,
'forces': [0.00022106, 0.00015631, -0.00038288],
'magmom': 0.0},
'species': [{'element': 'O', 'occu': 1}],
'xyz': [0.8602593257436201, -1.213674465777165, 1.485250836581234]},
{'abc': [0.7500021, 0.7499937, 0.2499979],
'label': 'O',
'properties': {'coordination_no': 6,
'forces': [-0.00022106, -0.00015631, 0.00038288],
'magmom': 0.0},
'species': [{'element': 'O', 'occu': 1}],
'xyz': [-0.85470252574362, 1.217603715777165, 4.455652783418766]}]}
The properties
dict is a general store for site properties, here coordination_no, forces and magnetic moment are defined, but in principle it's for any key-value pair. The species
dict handles oxidation state and partial occupancies: the Structure definition can handle crystals with disordered sites (i.e. a site that is, say, on average 50% one atom, 50% another).
And this is a Molecule (methane):
{'@class': 'Molecule',
'@module': 'pymatgen.core.structure',
'charge': 0,
'sites': [{'name': 'C',
'properties': {},
'species': [{'element': 'C', 'occu': 1}],
'xyz': [0.0, 0.0, 0.0]},
{'name': 'H',
'properties': {},
'species': [{'element': 'H', 'occu': 1}],
'xyz': [0.0, 0.0, 1.089]},
{'name': 'H',
'properties': {},
'species': [{'element': 'H', 'occu': 1}],
'xyz': [1.026719, 0.0, -0.363]},
{'name': 'H',
'properties': {},
'species': [{'element': 'H', 'occu': 1}],
'xyz': [-0.51336, -0.889165, -0.363]},
{'name': 'H',
'properties': {},
'species': [{'element': 'H', 'occu': 1}],
'xyz': [-0.51336, 0.889165, -0.363]}],
'spin_multiplicity': 1}
Molecules and Structures are very similar: the Structure is basically a Molecule with periodicity added. Molecules are collections of Sites, Structures are collections of PeriodicSites with a Lattice defined.
In general, if you're pursuing a new standard, it would be really nice to see support for both periodic crystals and isolated molecules. In a few places, I've seen tools or code that are designed solely for molecules and then, later, when support for periodic crystals needs to be added, it's very difficult to add support. This is just personal experience, but I think having support for periodicity in mind from the start is really helpful, and makes the standard more transferrable.
The Materials Project software stack (including atomate for workflow generation, fireworks for running workflows/workflow management, pymatgen for analysis) makes heavy use of JSON -- any class that subclasses
MSONable
has a JSON representation. This includesStructure
(periodic crystals) andMolecule
classes in pymatgen, as well as the workflows themselves and calculation outputs.I'm not sure if this is relevant to the current effort, since atomate is primarily used for inorganic materials at present, and is quite general and not quantum chemistry-specific, but I thought it'd be worth adding to the list in case it's of interest to anyone here.