Structure features 'assemblies' and 'disorder' depend on representation, not actual structure features

merkys commented 3 years ago

From the specification:

assemblies: this flag MUST be present if the property assemblies is present.

disorder: this flag MUST be present if any one entry in the species list has a chemical_symbols list that is longer than 1 element.

However, both assemblies and disorder do not directly depend on the features of structure, but on its representation by the provider. Consider these two alternative descriptions of the same structure (taken from the specification):

       {
         "cartesian_site_positions": [[0,0,0]],
         "species_at_sites": ["SiGe-vac"],
         "species": [
         {
           "name": "SiGe-vac",
           "chemical_symbols": ["Si", "Ge", "vacancy"],
           "concentration": [0.3, 0.5, 0.2]
         }
         ]
         // ...
       }

and

       {
         "cartesian_site_positions": [ [0,0,0], [0,0,0], [0,0,0] ],
         "species_at_sites": ["Si", "Ge", "vac"],
         "species": [
           { "name": "Si", "chemical_symbols": ["Si"], "concentration": [1.0] },
           { "name": "Ge", "chemical_symbols": ["Ge"], "concentration": [1.0] },
           { "name": "vac", "chemical_symbols": ["vacancy"], "concentration": [1.0] }
         ],
         "assemblies": [
           {
         "sites_in_groups": [ [0], [1], [2] ],
         "group_probabilities": [0.3, 0.5, 0.2]
           }
         ]
         // ...
       }

Thus the structure in the first example would have structure features [ "disorder" ], whereas the second one [ "assemblies" ].

Having structure features that denote representation instead of actual structure features seems somewhat counter-intuitive to me. Could anyone confirm this was intentional, or is this a corner case?

rartino commented 3 years ago

IMO structure_features was intended more as a content negotiation feature between the client and server than something you typically would use to determine the "physics" of the material. Nevertheless, the absence of declaring features (assemblies, disorder, etc) does restrict the domain for the material. But, as you note in the example, it doesn't strictly work the other way - declaring a feature is not a commitment that the structure cannot have a more simple representation.

The computational difficulty in strictly knowing whether a simpler representation could exist aside, a typical scenario in which I forsee these flags being used is this:

A client fetches structures to do "normal" static DFT calculations. Hence, the user writing that client wants to exclude structures with disorder and assemblies, because those cannot easily be translated into, e.g., a VASP POSCAR file.
The client later adds the capability of transforming the OPTIMADE disorder representation into SQS supercells that can be calculated in VASP. Hence, the processing is extended to accept structures with the disorder feature. However, the greater generality of assemblies is not supported, so those structures are still excluded. (Even though, as you note, sometimes they could be translated into the simpler disorder representation.)

merkys commented 3 years ago

Thank you for the explanation. So structures having sites with mixtures of chemical elements or vacancies seem to be corner cases. I would prefer some way to dispel the ambiguity, but cannot think of an elegant solution. Surely we could attempt to standardize the representation, but I am in no position to suggest putting one of the representations in front of the other.

In CIF files (ultimate truth source for the COD) vacancies are expressed by occupancy parameter, which more naturally fits in the first representation. Mixture sites usually are split into several sites with the same coordinates, and we at the COD do little to identify such sites, as the number of such entries is low.

merkys commented 5 months ago

We have revisited the topic in workshop discussion with @rartino and @blokhin and it seems that we arrived to a consensus that we are OK with assemblies, disorder and structural_features describing the representation of data, not the underlying structure.

Materials-Consortia / OPTIMADE

Structure features 'assemblies' and 'disorder' depend on representation, not actual structure features #342