Materials-Consortia / OPTIMADE

Specification of a common REST API for access to materials databases
https://optimade.org/specification
Creative Commons Attribution 4.0 International
81 stars 37 forks source link

Structure features 'assemblies' and 'disorder' depend on representation, not actual structure features #342

Open merkys opened 3 years ago

merkys commented 3 years ago

From the specification:

  • assemblies: this flag MUST be present if the property assemblies is present.
  • disorder: this flag MUST be present if any one entry in the species list has a chemical_symbols list that is longer than 1 element.

However, both assemblies and disorder do not directly depend on the features of structure, but on its representation by the provider. Consider these two alternative descriptions of the same structure (taken from the specification):

       {
         "cartesian_site_positions": [[0,0,0]],
         "species_at_sites": ["SiGe-vac"],
         "species": [
         {
           "name": "SiGe-vac",
           "chemical_symbols": ["Si", "Ge", "vacancy"],
           "concentration": [0.3, 0.5, 0.2]
         }
         ]
         // ...
       }

and

       {
         "cartesian_site_positions": [ [0,0,0], [0,0,0], [0,0,0] ],
         "species_at_sites": ["Si", "Ge", "vac"],
         "species": [
           { "name": "Si", "chemical_symbols": ["Si"], "concentration": [1.0] },
           { "name": "Ge", "chemical_symbols": ["Ge"], "concentration": [1.0] },
           { "name": "vac", "chemical_symbols": ["vacancy"], "concentration": [1.0] }
         ],
         "assemblies": [
           {
         "sites_in_groups": [ [0], [1], [2] ],
         "group_probabilities": [0.3, 0.5, 0.2]
           }
         ]
         // ...
       }

Thus the structure in the first example would have structure features [ "disorder" ], whereas the second one [ "assemblies" ].

Having structure features that denote representation instead of actual structure features seems somewhat counter-intuitive to me. Could anyone confirm this was intentional, or is this a corner case?

rartino commented 3 years ago

IMO structure_features was intended more as a content negotiation feature between the client and server than something you typically would use to determine the "physics" of the material. Nevertheless, the absence of declaring features (assemblies, disorder, etc) does restrict the domain for the material. But, as you note in the example, it doesn't strictly work the other way - declaring a feature is not a commitment that the structure cannot have a more simple representation.

The computational difficulty in strictly knowing whether a simpler representation could exist aside, a typical scenario in which I forsee these flags being used is this:

merkys commented 3 years ago

Thank you for the explanation. So structures having sites with mixtures of chemical elements or vacancies seem to be corner cases. I would prefer some way to dispel the ambiguity, but cannot think of an elegant solution. Surely we could attempt to standardize the representation, but I am in no position to suggest putting one of the representations in front of the other.

In CIF files (ultimate truth source for the COD) vacancies are expressed by occupancy parameter, which more naturally fits in the first representation. Mixture sites usually are split into several sites with the same coordinates, and we at the COD do little to identify such sites, as the number of such entries is low.

merkys commented 3 months ago

We have revisited the topic in workshop discussion with @rartino and @blokhin and it seems that we arrived to a consensus that we are OK with assemblies, disorder and structural_features describing the representation of data, not the underlying structure.