FAIRmat-NFDI / data-modeling

3 stars 1 forks source link

#41: "Abstract" base classes and inheritance in NeXus. Draw out possibilities and limitations. #44

Open sanbrock opened 1 year ago

sanbrock commented 1 year ago

NeXus inheritance, extensions and specialisations of doc strings and enumerations

Base classes and appdefs: NeXus supports two type of main definition categories, so called: base classes, and application definitions. The basic difference is that the default optionality of their defined elements. For base classes the elements are “optional” by default. For application definition the elements are "required" by default. In addition elements can be set as "recommended" but when to opt for such a choice is a matter of preference. "Recommended" is interpreted as "optional".

The "extends" keyword: We can also note that while most of the definitions extend NXobject, a few applications extend another application definition. As the documentation says: “In contrast to NeXus base classes, NeXus supports inheritance in application definitions.” On the other hand, since the keyword ‘extends’ is so rarely used (not pointing to NXobject) in current applications, it is a question if and how such inheritance is implemented by different tools and how inherited data items are handled inside the new definition. Only extension (addition of new data items, like groups/fields/attributes) is supported or also override where already introduced elements could even be redefined?

Another question is how the data item definitions of base_classes are (re)used in another base class or in an application definition if no inheritance is supported? Actually, the reuse/reusability is triggered by referencing a base class as a ‘type=’. Here, the assumption is that all data items defined under the tree of the referenced base class and in the trees of the base classes referenced therein will automatically be available for reuse under a (not always) specified ‘name’ in the new definition. Hence, definitions are inherited inside base_classes as well. When such a reference also provides extra definition elements (e.g. doc in case of NXbeam/DATA), this is handled as a specific definition (which is) only valid for this item (@sanbrock: item is unclear here, you mean the specific class/base instance?) which further specifies the original base definition (in case of NXbeam/DATA(doc), the original NXdata(doc) is actually extended).
 (@sanbrock: is it extended or overwritten, i.e. will the specific NXbeam/DATA(doc) overwrite the original NXdata(doc)?)

Reuse in an application definition is the same with the difference that optionality is by default switched to being “required” (E.g. NXareps/ENTRY/title(optional)=False as opposed to NXentry/title(optional)=True).
Note that although definitions are inherited, if modifications happen at a specific data item they result in a new item definition (e.g. extending/specialising the documentation, changing optionality, adding new data items, etc.). Such an extended/modified item definition will be then inherited when this item is referenced inside another definition. E.g. NXmy_arpes/ENTRY/arpes_base[NXarpes] /just like NXmy_arpes(NXarpes)/ would inherit NXarpes/ENTRY/INSTRUMENT/analyser[NXdetector]/acquisition_mode(enum:[swept, fixed]) rather then NXdetector(enum=[gated, triggered, summed, event, histogrammed, decimated])

With its ‘type’-referencing definition-reuse functionality, NeXus implements Single, Multilevel and Hierarchical Inheritance (see https://beginnersbook.com/2013/05/java-inheritance-types/ ):
NXarpes/ENTRY/INSTRUMENT/analyser[NXdetector] extends the referenced NXentry/INSTRUMENT/DETECTOR which is referencing NXinstrument/DETECTOR which is referencing NXdetector by new data items including the field ‘energies’. The inheritance in NeXus allows the reuse of a complete definition tree with all its inherited sub definitions. Overriding a data item can be achieved after referencing it with the corresponding name/type combination. Note that for convenience, doc strings are not overridden, but extended/specialised by default, and any overriding doc string shall explicitly state if inherited doc strings shall not be considered. (@sanbrock: mmh, for yaml definitions I do not see that we have implemented a functionality to decide which docstring should be used then or is it that when we do not leave a docstring for the specialized class that by default the one of the default kicks whereas when we define a docstring for the specialized class then this docstring will kick in.)

Inheritance relationships:

IS A - implemented in NeXus by ‘extends=‘ or ‘type=‘

HAS or MAY CONTAIN (depending on optionality) - implemented in NeXus by explicit or inherited sub definitions

It is noteworthy to mention that from an edge graph point of view the relation "may_contain" is a tricky one. For a schema it is useful to know what may be there but for an instance it is usually not sufficient to know if something may exists but you want to know whether the relation exists or not. In that sense a NeXus schema represents a graph with special edge relations which are only instantiated once you have an instance.

tomio13 commented 1 year ago

I was wondering how to indicate for a doc to be overwriting or extending... Maybe add a + or - sign as first character to the text?