Materials-Consortia / OPTIMADE

Specification of a common REST API for access to materials databases
https://optimade.org/specification
Creative Commons Attribution 4.0 International
82 stars 37 forks source link

Units (and their machine-readability) #283

Closed giovannipizzi closed 4 years ago

giovannipizzi commented 4 years ago

At the moment, units can be specified but there is no explanation of what they should be except that they are a string. In #282 there is a slight improvement with a clarification.

In any case, they remain a string, so human-readable, rather than machine-readable.

Do we want to make this field better specified?

Some options:

merkys commented 4 years ago
  • We create a long list of accepted units (I think the Pauling file has already such a list?) - possibly out of the specification so it's easier to add more?

Maybe we can reuse some well-maintained vocabulary of units?

giovannipizzi commented 4 years ago

@blokhin I think the Pauling File has one? Is it re-usable?

blokhin commented 4 years ago

Actually, PAULING FILE adopts SI units everywhere. A scalar property is always converted to SI and stored in SI, under assumption the other properties, i.e. conditions (at least, the temperature and the pressure) are known.

blokhin commented 4 years ago

Interestingly, such discussions are also conducted at the International Bureau of Weights and Measures.

blokhin commented 4 years ago

In general, I believe OPTIMADE should also ideally enforce SI in the materials databases, but in practice it's of course unreachable. There's an excellent example of the computational Raman spectroscopy, where the units are very confusing and entangled, and you cannot do much with that except being an expert.

On the other hand, there is the list of the well-known physical properties, easily computed ab initio, with the clear units defined. Could we try at least to collect them in a single place?

merkys commented 4 years ago

There is this The Unified Code for Units of Measure, which aims to provide unambiguous string representation of units. Maybe worth for further investigation?

rartino commented 4 years ago

Standardizing how to enter units in the fields that say "units" seems a very nice thing to do, perhaps even for v1.0.0 proper (?)

I'm afraid it is too late to completely standardize on only SI units, since our structure's cartesian coordinates are specified as in Angstrom.

After some searching, I landed in the same info as @merkys. We could simply add to the definition of our unit fields that they must be a valid 7 bit ASCII representation of units according to version 2.1 of The Unified Code for Units of Measure.

Should we write this up as a PR? It seems a very small change to the spec, and it only helps to clarify what should go into the unit field.

merkys commented 4 years ago

Should we write this up as a PR? It seems a very small change to the spec, and it only helps to clarify what should go into the unit field.

I have prepared #299 according @rartino suggestion.