NeurodataWithoutBorders / nwb-schema

Data format specification schema for the NWB neurophysiology data format
http://nwb-schema.readthedocs.io
Other
53 stars 16 forks source link

Need a mechanism to extend schema without subclassing #141

Closed campagnola closed 4 years ago

campagnola commented 6 years ago

General Problem

There are several places in the schema where we run into problems caused by the use of class inheritance as the sole mechanism of extension. It is often desirable to have a class that combines properties from multiple superclasses, but it is not straightforward to achieve this combination without either modifying the core classes or duplicating code. Examples: #131 #129

The typical case that we have seen repeated is that a base class (such as ImageSeries, or PatchClampSeries) comes in many different varieties, and each one implies different types/combinations of metadata. Attempting to use an inheritance hierarchy to describe this variety leads to combinatorial explosion, so instead we need a compositional approach (for background read any of the many web articles on "inheritance vs composition").

A specific example

The core spec currently contains two classes ImageSeries and TwoPhotonSeries, where the latter class inherits from the former and adds on metadata specific to a 2-photon imaging session, such as scan_line_rate and pmt_gain. Now we want to create a new class Image that is just like ImageSeries, but representing a single snapshot in time. Things get tricky from here:

The number of classes that would be needed to describe this combinatorial space grows multiplicatively with each new piece of metadata. A more natural approach is to simply say: we have a base Image class, and it contains any number of extra metadata pieces. However switching to a compositional approach requires careful consideration; we still want to have a strongly defined schema, and there are many different possible approaches to implementing composition.

What kind of composition to use?

We want a solution with the following properties:

Proposed solution

Python API example:

# Video data is acquired with a camera. To store this,
# create ImageSeries and CameraMeta objects, and then attach
# them together:
image = ImageSeries(data)
cam_meta = CameraMeta(exposure_time=10e-3, binning=(2,2))
image.extend(cam_meta)  # or maybe `image.attach(...)`

# A shorthand version of the above:
image = ImageSeries(ext=[CameraMeta(...)])

# Now we can see that this is a camera image:
image.has_interface(CameraMeta)  # returns True

# And perhaps we can specify that the CameraMeta and TwoPhotonMeta
# interfaces are mutually exclusive
image.extend(TwoPhotonMeta(...))   # raises an exception

# We can also directly access the attributes that were created by the 
# CameraMeta interface
image.exposure_time   # returns 10e-3

# A more complex example:
image = ImageSeries(ext=[
    CameraSyncSignal(trigger_data=...),
    CameraMeta(exposure_time=10e-3, ext=[
        ImagingFilterMeta(excitation_wavelength=568, emission_wavelength=730)
    ])
])

A core spec modification is needed in order to store information about the interfaces attached to any object:

- neurodata_type_def: NWBContainer
  groups:
    - name: ext
      groups:
         - neurodata_type_inc: NWBContainer
      quantity: *
      datasets:
         - neurodata_type_inc: NWBData

And the Interface objects created above would need to be described in the spec (core or extension) as well:

- neurodata_type_def: CameraImageMeta
  neurodata_type_inc: Interface
  neurodata_type_parent: Image
  datasets:
  - name: exposure_time
    dtype: float
    doc: Duration of each frame exposure in seconds
  - name: exposure_time
    dtype: float
    doc: Duration of each frame exposure in seconds

Open questions

oruebel commented 6 years ago

From what I can tell, this can be expressed with the current schema language, i.e., you have a group for storing metadata that allows you to add zero or more metadata containers. I don't think we need to modify NWBContainer for this (not all types need this feature) but you could just create a new base type that extends NWBContainer

campagnola commented 6 years ago

The suggestion above was the result of a discussion I had with @ajtritt at the end of the hackathon. I actually do think all types need this support so that people writing their own metadata extensions are not tempted to use subclassing. If we continue down this path of subclassing, we will have many extensions created by the community that cannot be used together. At best, this will result in a lot of code duplication. At worst, this will discourage extension sharing and prevent further standardization.

The suggestion above also has some real benefits over simply making a metadata group: