MolSSI / QCSchema

A Schema for Quantum Chemistry
http://molssi-qc-schema.readthedocs.io/en/latest/index.html#
BSD 3-Clause "New" or "Revised" License
96 stars 36 forks source link

Add item to "existing JSON efforts" #18

Open mkhorton opened 6 years ago

mkhorton commented 6 years ago

The Materials Project software stack (including atomate for workflow generation, fireworks for running workflows/workflow management, pymatgen for analysis) makes heavy use of JSON -- any class that subclasses MSONable has a JSON representation. This includes Structure (periodic crystals) and Molecule classes in pymatgen, as well as the workflows themselves and calculation outputs.

I'm not sure if this is relevant to the current effort, since atomate is primarily used for inorganic materials at present, and is quite general and not quantum chemistry-specific, but I thought it'd be worth adding to the list in case it's of interest to anyone here.

mkhorton commented 6 years ago

Also worth mentioning schema.org, http://schema.org/docs/schemas.html and their extensions for specialized applications.

If the intent is to create a standard, schema.org might have some good inspiration, and might be worth getting the QC schema listed as an extension. They define a vocabulary, not a file format, though JSON-LD is supported.

mkhorton commented 6 years ago

There's also http://esl.cecam.org/ESCDF_-_Electronic_Structure_Common_Data_Format which is not JSON at all, but is a serious attempt to create standards for some of the larger files produced by electronic structure codes, which might also be good for inspiration for what fields to include.

dgasmith commented 6 years ago

Thank you for the very helpful links. I would like to link your representations on the main page, do you happen to have documentation links or examples of your Structure or Molecule JSON?

I didn't realize CECAM had the ESCDF project, I will have to reach out to them. Thanks for the tip!

mkhorton commented 6 years ago

Examples are easiest to obtain via the Material Project's API, but I'll include two below.

This is a Structure (NiO):

{'@class': 'Structure',
 '@module': 'pymatgen.core.structure',
 'lattice': {'a': 2.9752645085077005,
             'alpha': 106.81598496229357,
             'b': 5.142196203291477,
             'beta': 120.00000472813846,
             'c': 2.975264009520628,
             'gamma': 73.1840097254608,
             'matrix': [[0.85888503, 2.4292932, 1.48763233],
                        [-2.57109727, 0.00392925, 4.45327129],
                        [1.71776904, -2.4292932, 0.0]],
             'volume': 37.15665600377399},
 'sites': [{'abc': [0.0, 0.0, 0.0],
            'label': 'Ni',
            'properties': {'coordination_no': 6,
                           'forces': [0.0, 0.0, 0.0],
                           'magmom': 1.724},
            'species': [{'element': 'Ni', 'occu': 1}],
            'xyz': [0.0, 0.0, 0.0]},
           {'abc': [0.5, 0.5, 0.5],
            'label': 'Ni',
            'properties': {'coordination_no': 6,
                           'forces': [0.0, 0.0, 0.0],
                           'magmom': -1.724},
            'species': [{'element': 'Ni', 'occu': 1}],
            'xyz': [0.0027784000000000697, 0.001964625000000053, 2.97045181]},
           {'abc': [0.2499979, 0.2500063, 0.7500021],
            'label': 'O',
            'properties': {'coordination_no': 6,
                           'forces': [0.00022106, 0.00015631, -0.00038288],
                           'magmom': 0.0},
            'species': [{'element': 'O', 'occu': 1}],
            'xyz': [0.8602593257436201, -1.213674465777165, 1.485250836581234]},
           {'abc': [0.7500021, 0.7499937, 0.2499979],
            'label': 'O',
            'properties': {'coordination_no': 6,
                           'forces': [-0.00022106, -0.00015631, 0.00038288],
                           'magmom': 0.0},
            'species': [{'element': 'O', 'occu': 1}],
            'xyz': [-0.85470252574362, 1.217603715777165, 4.455652783418766]}]}

The properties dict is a general store for site properties, here coordination_no, forces and magnetic moment are defined, but in principle it's for any key-value pair. The species dict handles oxidation state and partial occupancies: the Structure definition can handle crystals with disordered sites (i.e. a site that is, say, on average 50% one atom, 50% another).

And this is a Molecule (methane):

{'@class': 'Molecule',
 '@module': 'pymatgen.core.structure',
 'charge': 0,
 'sites': [{'name': 'C',
            'properties': {},
            'species': [{'element': 'C', 'occu': 1}],
            'xyz': [0.0, 0.0, 0.0]},
           {'name': 'H',
            'properties': {},
            'species': [{'element': 'H', 'occu': 1}],
            'xyz': [0.0, 0.0, 1.089]},
           {'name': 'H',
            'properties': {},
            'species': [{'element': 'H', 'occu': 1}],
            'xyz': [1.026719, 0.0, -0.363]},
           {'name': 'H',
            'properties': {},
            'species': [{'element': 'H', 'occu': 1}],
            'xyz': [-0.51336, -0.889165, -0.363]},
           {'name': 'H',
            'properties': {},
            'species': [{'element': 'H', 'occu': 1}],
            'xyz': [-0.51336, 0.889165, -0.363]}],
 'spin_multiplicity': 1}

Molecules and Structures are very similar: the Structure is basically a Molecule with periodicity added. Molecules are collections of Sites, Structures are collections of PeriodicSites with a Lattice defined.

In general, if you're pursuing a new standard, it would be really nice to see support for both periodic crystals and isolated molecules. In a few places, I've seen tools or code that are designed solely for molecules and then, later, when support for periodic crystals needs to be added, it's very difficult to add support. This is just personal experience, but I think having support for periodicity in mind from the start is really helpful, and makes the standard more transferrable.

mkhorton commented 6 years ago

Also http://stuchalk.github.io/scidata/