NXdata linking of signal

FAIRmat-NFDI / nexus_definitions

Definitions of the NeXus Standard File Structure and Contents

https://manual.nexusformat.org/

Other

6 stars 8 forks source link

NXdata linking of signal #205

Closed domna closed 2 months ago

domna commented 6 months ago

This enables the option to describe where the signal data of an NXdata comes from. It's a similar concept as the AXISNAME_depends. It's just a proposal open for discussion and I'm not entirely sure if this is the right path to do it. So I welcome your feedback @lukaspie and @rettigl. Also general feedback from @FAIRmat-NFDI/areab is welcome.

Different options could be:

Additionally or instead provide a depends field for the NXdata group to denote that it was linked to somewhere else. This is essentially a shortcut for writing out AXISNAME_depends + DATA_depends if they belong to the same target group.
Allowing an array or different mechanism here to also describe composite data, e.g., if we combine two detectors together and need to denote which axis comes from where. There we'd need an additional mechanism to tie the axisname to the data depends.

This is coming from a recent discussion with Anders Hahlin and Anders Frisk. They are trying to combine multiple detector readings into the top-level data (e.g., for spin-resolved measurements).

lukaspie commented 6 months ago

I like the overall idea and agree that this is something that is needed. I am also not sure what would be the best approach.

Additionally or instead provide a depends field for the NXdata group to denote that it was linked to somewhere else. This is essentially a shortcut for writing out AXISNAME_depends + DATA_depends if they belong to the same target group.

In general, I have the feeling that NeXus more often uses a depends_on field or a @depends_on attribute to denote such concepts, instead of ELEMENT_depends. So maybe we could just use attributes @depends_on:

NXdata:
  \@depends_on(NX_CHAR):
    doc: |
      Points to the path of a field defining the data on which the `DATA` field depends.
  [...]
  AXISNAME(NX_NUMBER):
    \@depends_on(NX_CHAR):
      Points to the path of a field defining the axis on which the ``AXISNAME`` axis depends.

Then, these attributes would anyway be associated with the group/field, so there would be no need for the uppercase notation.

In any case, I would argue it should be the same for the NXdata group and the corresponding AXISNAME fields. That is, if we keep AXISNAME_depends, we should also use DATA_depends (as @domna originally suggested). This would also be in line with having \@AXISNAME_indices as an immediate child of the NXdata group.

tomio13 commented 6 months ago

I would vote for a generic \@depends_on attribute, because it is a clean approach, and we do not have to specify what we mean for every element.

domna commented 6 months ago

In any case, I would argue it should be the same for the NXdata group and the corresponding AXISNAME fields. That is, if we keep AXISNAME_depends, we should also use DATA_depends (as @domna originally suggested). This would also be in line with having \@AXISNAME_indices as an immediate child of the NXdata group.

I agree and I think we cannot fully get rid of the AXISNAME_depends. If we only allow a top-level @depends we can only reference full NXdata groups but especially for the different axes we want to gather them from different transformations, i.e., explicitly not from a single NXdata group. Therefore, my suggestion is that we do DATA_depends and additionally allow a top-level depends which allows for referencing a whole NXdata group, i.e., it's a convenience for writing out all the depends if they come from the same group. Alternatively, we just omit the latter.

However, I was also thinking about the name _depends in general. For me it reads that the axis values itself depend on another field. This means that the values have to somehow stacked on top the other field. What we actually do is reference/link something.

rettigl commented 5 months ago

I agree and I think we cannot fully get rid of the AXISNAME_depends.

I think with Luka's suggestion to just make these attributes to the AXISNAME and DATA fields, we can get rid of them. I would suggest to not re-use @depends_on, because this is as far as I know reserved for NXtransformations. We can call it e.g. @ reference or @ source or so.

domna commented 5 months ago

I agree and I think we cannot fully get rid of the AXISNAME_depends.

I think with Luka's suggestion to just make these attributes to the AXISNAME and DATA fields, we can get rid of them. I would suggest to not re-use @depends_on, because this is as far as I know reserved for NXtransformations. We can call it e.g. @ reference or @ source or so.

Yes, I just generally meant that we cannot remove the depends associated with AXISNAME and DATA (that is as _depends or as attribute) and only use the top-level depends.

I like @reference. So the proposal based on Lukas' suggestion would be:

NXdata:
  \@reference(NX_CHAR):
    doc: |
      Points to the path of a field defining the data to which this NXdata group refers.
  [...]
  AXISNAME(NX_NUMBER):
    \@reference(NX_CHAR):
      Points to the path of a field defining the axis to which this ``AXISNAME`` axis refers.
  DATA(NX_NUMBER):
    \@reference(NX_CHAR):
       Points to the path of a field defining the axis to which the ``DATA`` refers.

We could also propose @reference as a general concept for when an array is taken/linked to somewhere else as we do this quite a few times in the appdef. That way users always know where the data comes from as hdf5 linking is not really visible when viewing a file.

mkuehbach commented 5 months ago

I agree that having an explicit way how to define that a concept A has a connection to another concept B in addition to the explicit parent, child relationships (that come with the concept tree that a base class and appdef defines) makes sense.

My argument always against using \@depends_on (although conceptually this makes sense) was that the symbol "depends_on" should be reserved for cases of NXtranslations. In this spirit I support "reference" I also can see an argument to use "refers_to" or: and this I would find better why at all use one symbol only? The point is the more we add these explicit decoration (of which I am not against of doing) is that we approach an implementation of how one would make statements in e.g. RDF, then why not use e.g. "has_a", "is_a", "is_equivalent_to" not also as \@has_a symbols in NeXus? This could also be used then in the NeXusOntology. The only point why I am hesitant going down this route is that we could then alternatively start off with formulating these references and additional connections between concepts directly using accepted semantic web technology thus watering our message.

tomio13 commented 5 months ago

One thing I am not sure I understand: why is @depends_on reserved for only one base class? The point I meant was why not to use it generally when the meaning is quite clear? As long as we express that the current element needs the definition of a previous one, I think it is a clear definition. I am also fine using another synonym, but the point I am pressing here: pick one, and make it usable in all cases. Perhaps implying a class binding as well, that is a NXdata refers back to data only (numeric array), an NXtransformations to another NXtransformations, etc. I find defining many synonyms with all special cases confusing and difficult to keep up with (I know, I am not clever enough for that...)

domna commented 5 months ago

The point is the more we add these explicit decoration (of which I am not against of doing) is that we approach an implementation of how one would make statements in e.g. RDF, then why not use e.g. "has_a", "is_a", "is_equivalent_to" not also as @has_a symbols in NeXus? This could also be used then in the NeXusOntology.

I agree with you that these terms are very useful for our description work. However, in this case here I would say that the concept of @reference is slightly different. It basically refers a dataset from somewhere else when I create an entity of data. Something like same_as feels similar but lies more on the conceptual level for me, e.g., I want to say that the NXsample/temperature is the same as NXmanipulator/sample_holder/temperature and we just create synonyms for the same data. This fixes everything on the appdef level, always. Contrary, the reference is a dynamic link which can especially be useful in the NXdata case where we leave it up to the user to construct the data themself.

One thing I am not sure I understand: why is @depends_on reserved for only one base class? The point I meant was why not to use it generally when the meaning is quite clear?

I agree with you and I think that @depends_on is also useful in other contexts. However, here I think the term "depends" (in whatever form) shouldn't be used in the first place as these values don't depend on, a.k.a. stack with the previous one. They rather reference the value from somewhere else.

lukaspie commented 3 months ago

LGTM. We should, however, also change the respective appdefs that use these concepts (NXmpes, NXmpes_arpes, NXxps (?)).

I rebased this branch and then used the @reference attribute in NXmpes and NXmpes_arpes. For NXxps, I would handle it in #249.

IMO, we can merge this PR now.