MolSSI / QCSchema

A Schema for Quantum Chemistry
http://molssi-qc-schema.readthedocs.io/en/latest/index.html#
BSD 3-Clause "New" or "Revised" License
95 stars 36 forks source link

Multi-method properties #39

Open dgasmith opened 6 years ago

dgasmith commented 6 years ago

In the course of a given QM computation multiple properties (particularly one-electron) could be constructed so that a single field may become ambiguous. A good example is a CCSD density computation which may form SCF, MP2, and CCSD densities. For a quantity like dipole_moments a program may build a set of moments for each density.

A possibility is to have keys for each scf_dipole_moments, mp2_dipole_moments, and ccsd_dipole_moments. Another possibility is to build a dipole moment definition object:

property: {
  type: "dipole moment"
  method: "SCF"
  value: ...
}

which holds the properties for each method. (Let me know if I misunderstood this @langner.)

Brought up by @langner in #37.

langner commented 6 years ago

I actually did not have a specific solution in mind, just think it'd be worthwhile to consider generalizing here. Adding keys is probably not scalable. Would there also be ccd_dipole_moments and cis_dipole_moments? And how about quadrupole moments?

dgasmith commented 6 years ago

I think this is a good @loriab question who has thought about this quite a bit more than I have. Generally my thought was to explicitly tag the "99% case" and have these keys extensible like ci5_dipole_moments which can be integrated into the schema via regex patterns.

wadejong commented 6 years ago

I will push for the objects instead of building a slew of keys...which a developer will have to look through, and which would have to be extended every time someone thinks of a new method.

Now, the method alone does not provide enough information. In reality, the method should point to the full information, basis set, SCF and settings, etc.

dgasmith commented 6 years ago

The output of that these properties are attached to the method, basis, settings etc. It seems overkill to to point to these again in a property structure, no?

I dont quite understand, defining keys seems to be a large part of this so that users do not have to go fishing through each programs output keys to figure out what they have. Again, we do not need to define keys for everything; however, I am reasonably certain that the majority of cases comprise of a relatively small subset of possible keys.

Dipole moments is a bit of an odd case as we have somewhat duplicate keys for multiple methods. However, the energy components of SCF/MP/CI/CC/SAPT/etc is all quite unique.

wadejong commented 6 years ago

If properties are already attached to method, etc. then having specific keys for specific methods seems to be mute. Reality is that many properties can be build from a (first-order) density matrix and integrals, and this density matrix can come from many methods. So, it's not just dipole moments. Actually this can hold for every property that can computed with response, there are now ccsd response methods out there.

langner commented 6 years ago

It's useful to have dedicated keys for the majority use cases, so maybe we should do both things. Several related issues come to mind:

  1. Where do we draw the line, to strike a good balance between convenience and unnecessary proliferation of rare properties?
  2. How do we deal with a situation where someone specifies the same property with a dedicated key and a some kind of full-blown object?
  3. There may be things about properties that one may wish to annotate, for example the center of mass for dipole moments if it's not aligned with whatever is specified for the molecule. How do you conveyed without additional structure to the property?
dgasmith commented 6 years ago

1) Does it particularly matter if someone documents a property that isn't widely know about/used? Codes that do not use construct the property can simply ignore it and if they implement the method they have known keys to use. 2) From a json-schema standpoint the spec support multiple objects as values. Digesting this does become more complex... anytime we add more information the complexity of parsing the returned JSON is going to go up. 3) No idea, this kind of lumps into other questions of what happens if someone asks for PCM or computes properties via response density.

langner commented 6 years ago

I feel like we need some sort of test suite of example calculations, which we would officially support, and the corresponding JSON schema files they would be represented by for guidance. This would capture the common cases and the special ones that have been raised. Maybe that would make the discussion more concrete? We've done this for cclib from the start and it's helped tremendously focus discussions about scope and supported methods.