NeurodataWithoutBorders / pynwb

A Python API for working with Neurodata stored in the NWB Format
https://pynwb.readthedocs.io
Other
178 stars 84 forks source link

[Feature]: Allow passing DynamicTable for ElectricalSeries.electrodes and RoiResponseSeries.rois #1920

Open rly opened 4 months ago

rly commented 4 months ago

What would you like to see added to PyNWB?

In my experience, the vast majority of users who create an ElectricalSeries or RoiResponseSeries want to reference all the electrodes or all the ROIs. Creating a DynamicTableRegion is a conceptual and technical hurdle. I think it would be easier for users if we allow them to pass the Electrodes table object to ElectricalSeries, in which case we create a DynamicTableRegion that references all the electrodes under the hood. Users should still be able to pass a DynamicTableRegion if they want to use a subset of the electrodes.

Downside: this may cause some confusion when people are navigating the file and encounter a DynamicTableRegion.

Thoughts? @bendichter ?

Is your feature request related to a problem?

No response

What solution would you like?

Shortcut that avoids the user having to create and link a DynamicTableRegion to create an ElectricalSeries or RoiResponseSeries.

Do you have any interest in helping implement the feature?

Yes.

Code of Conduct

rly commented 4 months ago

This change would be accompanied by a change in our docs that moves explanation of the DynamicTableRegion into a note box. This would indicate that using a DynamicTableRegion is more of a special case and can be skipped if it does not apply to you.

bendichter commented 4 months ago

I like this idea

oruebel commented 4 months ago

A possible alternative may be to have convenience methods to create the DynamicTableRegion. E.g, something like 'electrodes.all' or 'electrodes.region[3:7]' producing a DynamicTableRegion, which would be a shortcut but still make it explicit that you are using a selection and not the whole table. This could probably be a genetic method on DynamicTable itself.

bendichter commented 4 months ago

hmm I suppose we could do both

ElectricalSeries(..., electrodes=electrodes)

and you could also do

ElectricalSeries(..., electrodes=electrodes.region[3:7])

I think that would be a really clean user interface!

rly commented 4 months ago

I like @bendichter 's latest suggestion.

While we are rethinking this, what do you think about using the term "selection" or "rows" instead of "region"? "Region" to me suggests a contiguous selection, but we allow specifying a list of non-adjacent rows. "Region" is also not clear that it is a selection of rows and not some other grouping of rows and columns. Pandas uses DataFrame.iloc but that can include selecting columns as well as both rows and columns at once, and therefore might confuse people who expect the same behavior.

Separately, DynamicTableRegion requires a "description" and the proposed syntax does not allow setting the description. The description is required because DTR is a subtype of VectorData and VectorData requires "description". In practice, the DTR description is quite generic and not very informative. Scanning DANDI, here is a sampling of unique descriptions (which includes both uses in TimeSeries and DynamicTable objects). Many (40+) use either the first or second description below.

electrode_table_region
the electrodes that each spike unit came from
table region for rois in this segmentation
good roi region table
ECoG electrodes on brain
lfp channels on probe 810755797
lfp channels on probe 773592320
segmented cells with cell_specimen_ids
Segmented dendrites (height x width)
segmented cells with cell_specimen_ids
electrode table reference
'shank1 region',
  'shank2 region',
  'shank3 region',
  'shank4 region',
  'shank5 region',
  'shank6 region',
  'shank7 region',
  'shank8 region',
  'shank9 region'
electrode
unique cell ROI
region for Imaging plane0
all ROIs
Channel that observed the maximum waveform amplitude for the unit.
'ROIs for plane_0',
  'ROIs for plane_1',
  'ROIs for plane_2'
table region for rois in this segmentation
single electrodes
all cells
Electrodes involved with these spike events
all electrodes
All identified units
...

(In the vast majority of dandisets (>95%), it looks like the DynamicTableRegion spans the entire table, but I have not done a comprehensive query.)

We could set a default placeholder description such as "A selection of rows" or "no description", but I do not like how the default TimeSeries description is "no description". We cannot make the description optional because that breaks the rules of inheritance (VectorData requires it).

@oruebel @bendichter Any suggestions on what to set for the description or how to get around this?

bendichter commented 4 months ago

I like region because we are created a DynamicTableRegion. We could include descrption in the syntax: electrode.region(selection, description). In the case that they don't use the region method, we can auto-populate "all rows of table X"