Open avirshup opened 8 years ago
From @jchodera on slack:
I think we need to figure out what top-level structures "everybody" can agree on are important.
The list you provide (topology, properties, forcefields, wavefunctions, geometry/dynamics) sounds like it is a bit too heterogeneous in the level of abstraction. Instead, something simpler, like
- topology (anything static)
- state information (anything dynamical)
- tool definitions and input parameters (which could include forcefields, QM levels of theory)
- tool-computed input properties for specific states (which could include wavefunctions as well); could even be contained within a state definition, since it is associated with a given state
- tool-computed input properties for the topology (which would include cheminformatics stuff, or stuff that doesn't depend on a specific state)
You can take advantage of standardization that has been done in the past. What about a JSON format based on the Chemical Markup Language specification? If you go JSON-LD then you have full semantics and full interoperability.
I like the idea of incorporating parts of CML - for instance, CompChem dictionary, has a lot of good descriptive fields for QM computed properties.
@egonw - Thanks for pointing out JSON-LD, that actually seems like the solution to problem that we haven't created an issue for yet.
Also, would you mind pointing to some use cases for CML? I've been aware of it for a while, but haven't ever really done anything with it - it would be great to get a feel for the current use cases.
It's used in Bioclipse as it is the most verbose (explicit) file format, allowing us to store information we cannot store in other formats (like atom type info, which may be particularly useful when using custom force/new fields!).
While this never really picked up momentum, the original CML being XML, it also makes it really easy to use in other XML-documents (using the XML namespace standards), e.g. with CMLRSS (10.1021/ci034244p, green OA version).
This is the big question: how is data laid out inside the JSON structure?
The best working example currently comes from @jchodera on slack: