catalystneuro / ndx-extracellular-channels

BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

[Proposal] The return of ElectrodesTable #3

Open rly opened 1 month ago

rly commented 1 month ago

Problem: Neuropixels 1.0 probes write data from the same physical electrode contact to two different streams, AP and LF [ref].

In the latest iteration #1 , a Neuropixels 1.0 recording would consist of:

If the user wants to specify information about the probe insertion, then they have to create two identical ProbeInsertion objects, one for each ChannelsTable. If the user wants to specify information about the estimated and confirmed positions of each contact, they have to specify it in both ChannelsTable objects.

Alternatively, we could pull out these physical properties of the inserted probe into an ElectrodesTable. This ElectrodesTable would contain:

  1. a child ProbeInsertion table
  2. estimated_brain_area
  3. any metadata specific to a physical contact that affects all channels/streams, like whether it is dead

This ElectrodesTable would contain information about the physical electrode contact. The ChannelsTable would contain information about virtual channels, or data streams, that may have different properties (e.g., hardware filtering, gain, maybe referencing).

@bendichter @stephprince and I are also proposing a separate object (CoordinatesTable?) to store the estimated localization and confirmed localization of electrodes in some reference space and atlas. It would have coordinates, brain area, method, and reference space information. One object for estimated, one object for confirmed using method 1, one object for confirmed using method 2 (in case there are two). Each CoordinatesTable would point to the ElectrodesTable. (Under the model of #1, each CoordinatesTable would have had to point to both ChannelsTable objects above or be duplicated, which are not ideal.)

Pro: Reduced duplicate data

Con: For people storing non-Neuropixels 1.0 data, this ElectrodesTable will feel needlessly complex. We may be able to address that by adding convenience functions in the Python API.

@CodyCBakerPhD @h-mayorquin what do you think?

CodyCBakerPhD commented 1 month ago

Can you draw it up in mermaid really quick? What is the different between your 'Electrode' and a 'Contact'? A separate object for physical space seems fine, but how optional will it be in the link for the final 'ElectricalSeries' (ExtracellularSeries here I think to avoid namespace collision)

h-mayorquin commented 1 month ago

So, in your proposal how would the ElectrodesTable would be linked or related to the ChannelsTable to avoid this problem. This is not clear to me.

I have another proposal to solve the data redundancy:

In the current electrodes table each row can have an associated electrode_group, right?

Could we use the same mechanism to make the link to probe and insertion in the ChannelsTable at the row level instead of an attribute as it is now? With this, we can link a set of channels to the same Probe for each of the two ChanenlsTable in the neuropixels case. This will avoid the redundancy of having two probes and two probe insertions in the nwbfile for this case without creating more objects.

Plus, I am dealing with a conversion right now where an ExtracellularSeries comes from two probes. This is not covered with the current model but could be easily fixed with my proposal above.

rly commented 1 month ago

@CodyCBakerPhD

From #1 (Updated 2024-07-22):

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#ffffff', "primaryBorderColor': '#144E73', 'lineColor': '#D96F32'}}}%%

classDiagram
    direction LR

    class ExtracellularSeries {
        <<TimeSeries>>

        data : numeric
        --> unit : str = "microvolts"
        channels : DynamicTableRegion
        --> target : ChannelsTable
        channel_conversion : List[float], optional
        --> axis : int = 1
    }

    class ChannelsTable {
        <<DynamicTable>>
        --------------------------------------
        attributes
        --------------------------------------
        name : str
        description : str
        probe : Probe
        position_reference : str, optional
        electrical_reference_description : str, optional
        ground : str, optional
        position_confirmation_method : str, optional

        --------------------------------------
        columns
        --------------------------------------
        id : VectorData[int]
        contact : DynamicTableRegion
        --> target : ContactsTable
        reference_contact :  DynamicTableRegion, optional
        --> target : ContactsTable
        filter : VectorData[str], optional
        ---> Strings such as "Bandpass 0-300 Hz".
        estimated_position_ap_in_mm : VectorData[float], optional
        estimated_position_ml_in_mm : VectorData[float], optional
        estimated_position_dv_in_mm : VectorData[float], optional
        estimated_brain_area : VectorData[str], optional
        confirmed_position_ap_in_mm : VectorData[float], optional
        confirmed_position_ml_in_mm : VectorData[float], optional
        confirmed_position_dv_in_mm : VectorData[float], optional
        confirmed_brain_area : VectorData[str], optional
        ... Any other custom columns, e.g., ADC information
    }

    class ProbeInsertion {
        <<Container>>
        insertion_position_ap_in_mm : float, optional
        insertion_position_ml_in_mm : float, optional
        insertion_position_dv_in_mm : float, optional
        position_reference : str, optional
        hemisphere : Literal["left", "right"], optional
        insertion_angle_pitch_in_deg : float, optional
        insertion_angle_roll_in_deg : float, optional
        insertion_angle_yaw_in_deg : float, optional
        depth_in_um : float, optional
    }

    namespace ProbeInterface {
        class Probe {
            }

            class ProbeModel {
            }

            class ContactsTable {
            }
    }

    Probe *--> ProbeInsertion: might contain ProbeInsertion
    ExtracellularSeries ..> ChannelsTable : links to channels
    ChannelsTable *..> Probe : links to probe
    ChannelsTable ..> ContactsTable : row reference to contact
    note for ChannelsTable "ChannelsTable is no longer global"

This results in:

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#ffffff', "primaryBorderColor': '#144E73', 'lineColor': '#D96F32'}}}%%

classDiagram
    class ExtracellularSeriesAP {
        <<ExtracellularSeries>>
    }
    class ExtracellularSeriesLF {
        <<ExtracellularSeries>>
    }
    class ChannelsTableAP {
        <<ChannelsTable>>
    }
    class ChannelsTableLF {
        <<ChannelsTable>>
    }
    class ProbeInsertion {
        <<ProbeInsertion>>
    }
    class ContactsTable {
        <<ContactsTable>>
    }

    ExtracellularSeriesAP ..> ChannelsTableAP : links to channels
    ExtracellularSeriesLF ..> ChannelsTableLF : links to channels
    ChannelsTableAP *..> Probe : links to probe
    ChannelsTableLF *..> Probe : links to probe
    ChannelsTableAP ..> ContactsTable : row reference to contact
    ChannelsTableLF ..> ContactsTable : row reference to contact
    Probe *--> ProbeInsertion: might contain ProbeInsertion

Proposed (add ElectrodesTable and AnatomicalCoordinatesTable):

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#ffffff', "primaryBorderColor': '#144E73', 'lineColor': '#D96F32'}}}%%

classDiagram
    direction LR

    class ExtracellularSeries {
        <<TimeSeries>>

        data : numeric
        --> unit : str = "microvolts"
        channels : DynamicTableRegion
        --> target : ChannelsTable
        channel_conversion : List[float], optional
        --> axis : int = 1
    }

    class ChannelsTable {
        <<DynamicTable>>
        --------------------------------------
        attributes
        --------------------------------------
        name : str
        description : str
        reference_mode : str, optional

        --------------------------------------
        columns
        --------------------------------------
        id : VectorData[int]
        electrode : DynamicTableRegion
        --> target : ElectrodesTable
        reference_electrode :  DynamicTableRegion, optional
        --> target : ElectrodesTable
        filter : VectorData[str], optional
        ---> Strings such as "Bandpass 0-300 Hz".
        ... Any other custom columns, e.g., ADC information
    }

    class ElectrodesTable {
        <<DynamicTable>>
        --------------------------------------
        attributes
        --------------------------------------
        name : str
        description : str
        probe : Probe
        probe_insertion : ProbeInsertion, optional

        --------------------------------------
        columns
        --------------------------------------
        id : VectorData[int]
        contact : DynamicTableRegion
        --> target : ContactsTable
        estimated_brain_area : VectorData[str], optional
        ... Any other custom columns about physical electrode
    }

    class ProbeInsertion {
        <<Container>>
        insertion_position_ap_in_mm : float, optional
        insertion_position_ml_in_mm : float, optional
        insertion_position_dv_in_mm : float, optional
        position_reference : str, optional
        hemisphere : Literal["left", "right"], optional
        insertion_angle_pitch_in_deg : float, optional
        insertion_angle_roll_in_deg : float, optional
        insertion_angle_yaw_in_deg : float, optional
        depth_in_um : float, optional
    }

    class AnatomicalCoordinatesTable {
        <<DynamicTable>>
        --------------------------------------
        attributes
        --------------------------------------
        name : str
        description : str
        space : Space - not shown, defines x, y, z reference space
        method : str - e.g., "reconstruction from histology using SHARP-Track" or "estimated"

        --------------------------------------
        columns
        --------------------------------------
        id : VectorData[int]
        x : VectorData[float]
        y : VectorData[float]
        z : VectorData[float]
        brain_area : VectorData[str]
        target_object : DynamicTableRegion
        --> target : ElectrodesTable or FibersTable
    }

    namespace ProbeInterface {
        class Probe {
            }

            class ProbeModel {
            }

            class ContactsTable {
            }
    }

    ExtracellularSeries ..> ChannelsTable : links to channels
    ElectrodesTable *..> Probe : links to probe
    ChannelsTable ..> ElectrodesTable : row reference to electrode
    ElectrodesTable ..> ContactsTable : row reference to contact
    Probe *--> ProbeInsertion: might contain ProbeInsertion
    note for ChannelsTable "ChannelsTable is no longer global"
    note for ElectrodesTable "ElectrodesTable is no longer global"
    AnatomicalCoordinatesTable ..> ElectrodesTable : row reference to electrode

This results in:

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#ffffff', "primaryBorderColor': '#144E73', 'lineColor': '#D96F32'}}}%%

classDiagram
    class ExtracellularSeriesAP {
        <<ExtracellularSeries>>
    }
    class ExtracellularSeriesLF {
        <<ExtracellularSeries>>
    }
    class ChannelsTableAP {
        <<ChannelsTable>>
    }
    class ChannelsTableLF {
        <<ChannelsTable>>
    }
    class ElectrodesTable {
        <<ElectrodesTable>>
    }
    class ProbeInsertion {
        <<ProbeInsertion>>
    }
    class ContactsTable {
        <<ContactsTable>>
    }
    class EstimatedAnatomicalCoordinatesTable {
        <<AnatomicalCoordinatesTable>>
    }
    class ConfirmedAnatomicalCoordinatesTable {
        <<AnatomicalCoordinatesTable>>
    }

    ExtracellularSeriesAP ..> ChannelsTableAP : links to channels
    ExtracellularSeriesLF ..> ChannelsTableLF : links to channels
    ElectrodesTable *..> Probe : links to probe
    ChannelsTableAP ..> ElectrodesTable : row reference to electrode
    ChannelsTableLF ..> ElectrodesTable : row reference to electrode
    ElectrodesTable ..> ContactsTable : row reference to contact
    Probe *--> ProbeInsertion: might contain ProbeInsertion
    EstimatedAnatomicalCoordinatesTable ..> ElectrodesTable : row reference to electrode
    ConfirmedAnatomicalCoordinatesTable ..> ElectrodesTable : row reference to electrode
rly commented 1 month ago

@h-mayorquin The ElectrodesTable would have a column "contact". Each row of "contact" would be a reference to a row of the ContactsTable.

Could we use the same mechanism to make the link to probe and insertion in the ChannelsTable at the row level instead of an attribute as it is now? With this, we can link a set of channels to the same Probe for each of the two ChanenlsTable in the neuropixels case. This will avoid the redundancy of having two probes and two probe insertions in the nwbfile for this case without creating more objects.

In the current proposal, there would be one Probe and one ContactsTable, so the row-wise reference to Probe is not necessary. The ChannelsTable could contain an attribute (or row-wise reference) to the same ProbeInsertion in the same way that it links to the same Probe. That would help. But information about the estimated and confirmed positions and brain areas of each electrode would still be duplicated across the ChannelsTableAP and ChannelsTableLF.

Plus, I am dealing with a conversion right now where an ExtracellularSeries comes from two probes. This is not covered with the current model but could be easily fixed with my proposal above.

Why not make a separate ExtracellularSeries for each probe?

h-mayorquin commented 1 month ago

Why not make a separate ExtracellularSeries for each probe? I am thinking like this:

So as you mentioned in #1 of the devices tyipically present in an ecephys setup we only represent the probes as nwb devices. The DAQ/Recording and the headstages are lumped into the channels table and we do the wiring through the contact column in #1. I think clumping headstage-daq into the channels table is a good trade-off between simplicity and expresivity.

To make this concrete some specific setups are:

1) I want to keep everything into one channel table because that table represents the Intan acquistion system. They are the same thing in the physical word and I would like to represent that in nwb. Otherwise, I am duplicating information about references and other channel information that is common across the intan setup.

2) Spikeinterface/Neo do not have a way to automatize this process. There is no property in the format that we could use to route the data from each probe to a different channels table. Requiring user to input the map during the conversion is a bottleneck that I would not like to backed in the format.

I am not sure how unique is this case but there are definitley more DAQs that can take more that one probe and will represent data as one stream.

I think doing the mappings to the other structures (contacts, probes, andanatomatical tables) per row of the channels table is very neat because it allows us to create simple automated conversions that allow an increasing but not limiting degree of provenance capture if the information is available. In short, if the information is available you map the channels to the information and if not, then you at least can add the channels.

More concrently:

With this mapping your scenario only needs one device and my scenario only needs one channel table. Adding anatomical coordinates does not need an extra table because we just indicate a mapping.

Maybe there was a good reason we decided against doing a per-row mapping and I do not remember. In that case, please let me know.

rly commented 1 month ago

@h-mayorquin If I understand your proposal correctly, each row in the ChannelsTable would link to a Probe, and if you have two Neuropixels probes, the ExtracellularSeries for AP data would contain the AP data for both probes, and the linked ChannelsTable would contain all 768 channels, where the first 384 link to Probe 1 and the second 384 link to Probe 2. The LF data would go into a separate ExtracellularSeries and ChannelsTable, for a total of two ExtracellularSeries and two ChannelsTable objects.

In the original proposal, we would have four ExtracellularSeries and four ChannelsTable objects. Each ChannelsTable would be associated with one Probe at the top level, not one per channel.

Your proposal makes sense. However, in current NWB, data from a DAQ are sometimes organized into different objects to improve interpretability and reuse. For example, position, eye tracking, joystick movements, and other behavioral data are often acquired by the DAQ as analog channels, and these are stored in different NWB neurodata types. I think data from an ECoG grid and a probe recorded simultaneously are usually stored in two separate ElectricalSeries objects but one ElectrodesTable because there is only one (and then each row points to different electrode groups and devices).

I want to keep everything into one channel table because that table represents the Intan acquistion system. They are the same thing in the physical word and I would like to represent that in nwb. Otherwise, I am duplicating information about references and other channel information that is common across the intan setup.

I think the references and other channel information will not always be the same across channels, so it would be best to store these on a per-channel basis for maximum expressivity. This will already results in duplication regardless of whether Probe is specified per-channel.

Spikeinterface/Neo do not have a way to automatize this process. There is no property in the format that we could use to route the data from each probe to a different channels table. Requiring user to input the map during the conversion is a bottleneck that I would not like to backed in the format.

I agree, but the experimenter should have the information of which channels belonged to which probe, and I think splitting data from the DAQ into separate objects by probe would slightly improve interpretability and downstream processing of the data. A user might want to spike sort all channels of the same probe together and separately from channels from another probe. They can do that under either proposal, but it may be slightly more intuitive under the original proposal.

Maybe there was a good reason we decided against doing a per-row mapping and I do not remember. In that case, please let me know.

I do not remember; I think it felt cleaner and as a way of reducing duplication.

I do not feel strongly about having each channel in a ChannelsTable point to a Probe, which would allow data from multiple probes to be combined into a single ExtracellularSeries. @CodyCBakerPhD what do you think?

rly commented 1 month ago

More concrently:

  • You don't have or need (like in the case of nwb as an intermediate analysis format) channel, contact or anatomical metadata. Just dump your stream into an ExtracellularSeries.
  • You have channel metadata (reference or filtering). That's great. Just like your channels to a channel table and add as much information as you want.
  • You have your probe information. Great, just link the channels to the probe that they are coming from. You don't need to make claims about the wiring but we know which channels come from which probe.

I think we should not allow case 1 and we should require that having an ExtracellularSeries means having a Probe (which is the case in #1), just like how in current NWB, an ElectricalSeries currently requires an ElectrodeGroup and Device.

If we make ContactsTable.relative_position_in_um optional (as proposed in https://github.com/catalystneuro/ndx-extracellular-channels/pull/1#issuecomment-2243824542), then a user can create a ExtracellularSeries with just the model, manufacturer, and identifier of the Probe.

I cannot think of a reason why a user would not have those basic probe metadata, and if we allow people to not specify that, some won't and that would reduce the reusability of the data.

h-mayorquin commented 1 month ago

Your proposal makes sense. However, in current NWB, data from a DAQ are sometimes organized into different objects to improve interpretability and reuse. For example, position, eye tracking, joystick movements, and other behavioral data are often acquired by the DAQ as analog channels, and these are stored in different NWB

This is a good point. Some DAQS have data that is definitley not ecephys so the identification is not perfect. What we want to map to the ChannelsTable is a subset of the DAQ channels: the ecephys channels. Thanks for this clarification.

I think we should not allow case 1 and we should require that having an ExtracellularSeries means having a Probe (which is the case in https://github.com/catalystneuro/ndx-extracellular-channels/pull/1), just like how in current NWB, an ElectricalSeries currently requires an ElectrodeGroup and Device.

Fair enough, that's probably too permisive. They can use something else for that case. I agree with you.

A user might want to spike sort all channels of the same probe together and separately from channels from another probe. They can do that under either proposal, but it may be slightly more intuitive under the original proposal.

Probeinterface has the concept of probe group and spikeinterface run_sorter function can be run "by_probe" when a recording has a probe group attached for this very purpose. I am not sure if this fact should sway you one way or the other on the point you were making but those abstractions exists in-place already. Now, maybe those abstractions appeared because the current way is not intuitive as you are saying? I am not sure in this one. I think I would be informative to ask Alessio what was the inception of the probe group concept.

I do not feel strongly about having each channel in a ChannelsTable point to a Probe, which would allow data from multiple probes to be combined into a single ExtracellularSeries. @CodyCBakerPhD what do you think?

@rly , I discussed this today with @CodyCBakerPhD in a personal meeting and he says he is OK with this. Please do confirm when you have a chance @CodyCBakerPhD . If you feel neutral about it I can do this PR.

If I understand your proposal correctly, each row in the ChannelsTable would link to a Probe, and if you have two Neuropixels probes, the ExtracellularSeries for AP data would contain the AP data for both probes, and the linked ChannelsTable would contain all 768 channels, where the first 384 link to Probe 1 and the second 384 link to Probe 2. The LF data would go into a separate ExtracellularSeries and ChannelsTable, for a total of two ExtracellularSeries and two ChannelsTable objects.

To be fair, now that we have probe insertion linked to probe and if you agree to make the mapping to the probes per row I don't think I have a strong view on this. I feel that the actual series should be separated (so 4 in this case) but I am less certain about the ChannelsTable. Following my own logic about the correspondence between ChannelsTable and a subset of the DAQ, there should only be one table and all the series should map to different regions of the same table. That is, assuming they were acquired with the PXI-e/Imec card duo as depicted here.

image

But I actually not very sure. What are your thoughts on this?

I will create a separate issue for the AnatomicalCoordinatesTable because I think this issue already has too many threads and some of them have already been solved but I think that point deserves its own issue.

CodyCBakerPhD commented 1 month ago

@rly , I discussed this today with @CodyCBakerPhD in a personal meeting and he says he is OK with this. Please do confirm when you have a chance @CodyCBakerPhD . If you feel neutral about it I can do this PR.

It will make more sense when I see it written in Mermaid or schematically on a draft PR, but in in principle if there is a real life use case that facilitates the adjustment then it should be done

h-mayorquin commented 1 month ago

Mmm after working through some bugs with electrode table on neuroconv I have changed my mind about row mapping to device fot fhe following reasons:

All of those seems like good reasons to keep the current correspondece of one ChannelsTable one device. Avoiding incorrectly writing the Intan case by default seems like a weak reason in the light of the arguments above and we can fix it by having a probe attached to the intan recording when this case shows up.

h-mayorquin commented 4 weeks ago

Hi, @rly , sorry for kind of deraling the intent of your initial comment. I thought we got caught up between discussing the neuropixel case that you bought up, the AnatomicalCoordinates table and my own proposal of making a map by row instead of by table.

I re-open another issue to center the discussion again along the lines of your original intent #8 . Maybe we can close this one?