FAIRmat-NFDI / nexus_definitions

Definitions of the NeXus Standard File Structure and Contents
https://manual.nexusformat.org/
Other
5 stars 8 forks source link

How to correctly use NXprocess #181

Open lukaspie opened 4 months ago

lukaspie commented 4 months ago

Problem

In the NXprocess base class, the docstring says: "Document an event of data processing, reconstruction, or analysis for this data."

This suggests that one NXprocess should describe one event of data processing. However, NXprocess can at the moment contain multiple of NXregistration, NXdistortion, and NXcalibration, suggesting that it is possible to have multiple "events" in one NXprocess instance. This is somewhat inconsistent and it makes the other fields in NXprocess, which are related to the order of processing (like sequence_index) hard to consistenly use.

My suggestion

In #177, we have introduced the base class NXhistory for the description of the history of a physical entity. NXhistory can hold many of NXactivity as well as NXphysical_process and NXchemical_process. I propose to extend NXhistory such that it can also describe the history of processing events:

NXhistory base class:

(NXhistory):
  (NXactivity):
  (NXphysical_process):
  (NXchemical_process):
  (NXprocess):
  (NXregistration):
  (NXdistortion):
  (NXcalibration):

Then, on the app-def level, we can write:

(NXentry):
  processing_history(NXhistory):
    (NXprocess): # with base class inheritance, could be any of NXcalibration, NXregistration, NXdistortion, NX
    (NXregistration):
    (NXdistortion):
    (NXcalibration):

Additional ideas

1) Eventually, the idea would be that every of these base classes (incl. NXprocess) extends NXactivity (via base class inheritance) and gets a timestamp as well as a sequence index to fully describe the chain of events that occurred. 2) There exist the data idea that NXhistory is a graph with nodes NXactivity (and similar). We could make the edges in the graph more pronounced by using/modifying the existing NXgraph_* base classes. 3) We were discussing about how to describe the sequence of measurement events in the MPES framework (see #173). Maybe we could describe these measurement events as sets of NXactivity instances in the future.

What do you think @FAIRmat-NFDI/areab?

tomio13 commented 4 months ago

I would start with what is an 'event of data processing'. If we talk about a set of steps producing a single new data output, then it makes sense to allow multiple objects which together build this step up.

mkuehbach commented 4 months ago

140 with NXapm is a typical example how NXprocess has been thought of by NIAC - that is e.g. the processing of raw detector hits into calibrated time of flight is decomposed into a sequence of NXprocess instances. I am happy with this. However,

NXprocess has been designed with a single sequence_id only in the past. Means that implies the processing is a sequence so a much simpler graph than typically used. E.g. if you have a Y-junction where results of two NXprocesses are necessary input to another NXprocess which sequence_id should the inputs (NXprocesses) have? The idea of using NXhistory is essentially stating we wish to describe also such junctions as what they are: A graph with NXprocesses as nodes and directed edges connecting these. This is the essence I support. As most workflows can be modelled as triplets of some input (at least one) is fed to some functor (action/process with some (set) of algorithms happening in this box) and generates -> some (at least) one output