catalystneuro / neuroconv

Create NWB files by converting and combining neural data in proprietary formats and adding essential metadata.
https://neuroconv.readthedocs.io
BSD 3-Clause "New" or "Revised" License
51 stars 23 forks source link

[Feature]: Synchronization specification in YAML #249

Open CodyCBakerPhD opened 1 year ago

CodyCBakerPhD commented 1 year ago

What would you like to see added to NeuroConv?

As the [Sync I-V] PRs stand now, they allow code/API level interactions between interfaces to specify how to synchronize the timing of the data streams. It remains unclear exactly how to propagate this to the YAML (no-coding) structure.

Setup:

Assume we have, at the innermost session level (which could have aggregated some interfaces from the global or experiment level) something like

data_interfaces:
  - SpikeGLXLFPInterface
  - SpikeGLXNIDQInterface
  - AudioInterface
  - BehaviorEventInterface

where:

  1. The AudioInterface needs its timing info to be parsed from a certain channel of the SpikeGLXNIDQInterface.
  2. The AudioBehaviorEventInterface needs it's timing info to be synchronized between both the AudioInterface and the SpikeGLXNIDQInterface.
  3. The CameraInterface had a delayed start relative to the SpikeGLXNIDQInterface, so we need to align with the first TTL pulse, but aside from that the timestamps are synchronized.

Here are some proposals:

(a)

Then perhaps we specify this as a new block named synchronize following the structure

synchronize:
  InterfaceToBeSynchronized:
    synchronization_method:
      synchronization_source:  # such as another interface
        options_for_sync_source:  # like which channel ID

Example:

synchronize:
  AudioInterface:
    synchronize_timestamps:
      SpikeGLXNIDQInterface:
        channel_name: "name_of_nidq_channel_used_for_audio_sync"
  BehaviorEventInterface:
    synchronize_between_systems:
      primary:
         SpikeGLXNIDQInterface:
           channel_name: "name_of_nidq_channel_used_for_audio_sync"
      secondary:
         unsynchronized_audio_timestamps
  VideoInterface:
    synchronize_start_time:
      SpikeGLXNIDQInterface:
        channel_name: "name_of_nidq_channel_used_for_camera_sync"

(b)

Or perhaps it can be a new field specified alongside other options for each interface

source_data:
  InterfaceToBeSynchronized:
    primary_interface_options: ...
    ....
    synchronization_method:  # such as another interface
      synchronization_source:
        options_for_sync_source:

Example:

source_data:
  AudioInterface:
    file_path: "path/to/audio_file.wav"
    synchronize_timestamps:
      SpikeGLXNIDQInterface:
        channel_name: "name_of_nidq_channel_used_for_audio_sync"
  BehaviorEventInterface:
    file_path: "path/to/behavior_file.mat"
    variable_name: "poke_times"
    synchronize_between_systems:
      primary:
         SpikeGLXNIDQInterface: # Without further nesting implies using the timestamps of this interface itself
           channel_name: "name_of_nidq_channel_used_for_audio_sync"
      secondary:
         unsynchronized_audio_timestamps
  VideoInterface:
    file_path: "path/to/video.avi"
    synchronize_start_time:
      SpikeGLXNIDQInterface:
        channel_name: "name_of_nidq_channel_used_for_camera_sync"

if we think about it like a web form, perhaps this one makes more sense as it is something you think about at the same time as other interface-level specifications.

Do you have any interest in helping implement the feature?

Yes.

Code of Conduct

bendichter commented 1 year ago
      secondary:
         SpikeGLXNIDQInterface:  # <-- did you mean AudioInterface
           channel_name: "name_of_nidq_channel_used_for_audio_sync"
CodyCBakerPhD commented 1 year ago
      secondary:
         SpikeGLXNIDQInterface:  # <-- did you mean AudioInterface
           channel_name: "name_of_nidq_channel_used_for_audio_sync"

Now that's an interesting one - I reference the specific channel of the NIDQ used to sync Audio rather than the actual Audio timestamps to avoid implying that the sync operations we've specified must occur in a specific order (as is they are technically order-less dictionaries).

If we referred to the AudioInterface there as the source for the secondary timings, the result would change depending on when the AudioInterface times get synchronized in this chain of events

bendichter commented 1 year ago

@CodyCBakerPhD

  1. The AudioInterface needs its timing info to be parsed from a certain channel of the SpikeGLXNIDQInterface.
  2. The AudioBehaviorEventInterface needs it's timing info to be synchronized between both the AudioInterface and the SpikeGLXNIDQInterface.
  3. The CameraInterface had a delayed start relative to the SpikeGLXNIDQInterface, so we need to align with the first TTL pulse, but aside from that the timestamps are synchronized.

can I clarify this just so I can make sure we are talking about the same thing?

  1. The times recorded from the TTL pulses of a specific channel of SpikeGLXNIDQInterface should be used as the timestamps for AudioInterface, right?
  2. Since AudioBehaviorEventInterface is presumably derived from the raw data of AudioInterface, it makes sense that it would have the same timing. So I suppose we would want to use the timestamp interpolation method here, interpolating the timestamps from AudioBehaviorEventInterface using the mapping from AudioInterface (x) to SpikeGLXNIDQInterface (y)?
bendichter commented 1 year ago

I prefer b. I think it does a better job of isolating independent components. I don't quite understand the BehaviorEventInterface thought. What do you think of this?

source_data:
  AudioInterface:
    file_path: "path/to/audio_file.wav"
    synchronize_timestamps:
      SpikeGLXNIDQInterface:
        channel_name: "name_of_nidq_channel_used_for_audio_sync"
  BehaviorEventInterface:
    file_path: "path/to/behavior_file.mat"
    variable_name: "poke_times"
    synchronize_between_systems:
      primary:
        SpikeGLXNIDQInterface  # Without further nesting implies using the timestamps of this interface itself
          channel_name: "name_of_nidq_channel_used_for_audio_sync"
      secondary:
         AudioInterface  # <-- wouldn't this make more sense? Maybe I am misunderstanding
  VideoInterface:
    file_path: "path/to/video.avi"
    synchronize_start_time:
      SpikeGLXNIDQInterface:
        channel_name: "name_of_nidq_channel_used_for_camera_sync"
CodyCBakerPhD commented 1 year ago

The times recorded from the TTL pulses of a specific channel of SpikeGLXNIDQInterface should be used as the timestamps for AudioInterface, right?

Yes, specifically the conveniently named channel ID "name_of_nidq_channel_used_for_audio_sync"

Since AudioBehaviorEventInterface is presumably derived from the raw data of AudioInterface, it makes sense that it would have the same timing. So I suppose we would want to use the timestamp interpolation method here, interpolating the timestamps from AudioBehaviorEventInterface using the mapping from AudioInterface (x) to SpikeGLXNIDQInterface (y)?

Yes, the original unsynchronized behavioral event times occur in the same time domain as the Audio system, but there are no actual TTL pulses for the behavior events, only the the times of acquisition for frames in the acoustic series of the audio interface. We want to map those times to the NIDQ (primary/common) time domain to directly compare to some ecephys data.

What do you think of this? <-- wouldn't this make more sense? Maybe I am misunderstanding

Yes that does make more sense, but still has the problem of https://github.com/catalystneuro/neuroconv/issues/249#issuecomment-1347222179 - the reference to AudioInterface as the secondary timestamps when synchronizing the BehaviorEventInterface. That is only true if the AudioInterface does not perform its synchronization operation prior to the BehaviorEventInterface, but since these are dictionaries it just seems like bad practice to make an assumption like that. Seems better to try to use the values you know to be globally true irrespective of order. I've adjusted the original post to reflect this as far as I can right now.

An explicit example just to be clear:

Primary (Audio, synchronized to be as NIDQ sees them) timestamps: [1.2, 1.31, 1.42]  # 1.2 second delay in start of audio while mimicking .01 second drift for 10 Hz rate
Secondary (Audio, unsynchronized) timestamps: [0.0, 0.1, 0.2] 
Unsynchronized behavior event times (in unsynchronized audio time): [0.06, 0.15]
Synchronized behavior event times (in synchronized NIDQ time): [1.266, 1.365]

I suppose a solution on the YAML side could be to allow references like you have to things like AudioInterface timestamps to refer to their values before any synchronization is done, but that still seems like it might be confusing... When this is done purely in code it's obvious because you can store the unsynchronized_audio_timestamps prior to doing anything to the AudioInterface and then reuse them later wherever they're needed.

bendichter commented 1 year ago

Oh, I understand now, but I really don't see how this synchronization can be done without knowing the original Audio timestamps. You need to do this operation:

synched_behav_timestamps = np.interp1(
    x=BehaviorEventInterface.original_timestamps,
    xp=AudioInterface.original_timestamps,
    fp=SpikeGLXNIDQInterface.ttl_timestamps["name_of_nidq_channel_used_for_audio_sync"],
)

How can this be done without referencing AudioInterface.original_timestamps?

CodyCBakerPhD commented 1 year ago

but I really don't see how this synchronization can be done without knowing the original Audio timestamps.

Right, it can't. But that's easy to accomplish when doing this in code

unsychronized_audio_timestamps = my_audio_interface.get_times()

my_audio_interface.synchronize_timestamps(...) # using the NIDQ times
my_behavior_event_interface = synchronize_between_systems(
    primary=SpikeGLXNIDQInterface.ttl_timestamps["name_of_nidq_channel_used_for_audio_sync"],
    secondary=unsychronized_audio_timestamps ,
)

And I'm say we could make it clear in the YAML usage than when saying something like

    synchronize_between_systems:
      primary:
        SpikeGLXNIDQInterface 
          channel_name: "name_of_nidq_channel_used_for_audio_sync"
      secondary:
         AudioInterface  # <-- This refers to the unsynchronized timestamps

and in the code for parsing the YAML, if we see an interface specified like that, grab its timestamps before applying synchronization so it can be passed into other sync methods that might need to use them.

IDK that could still get confusing to a user though...

would need something like

    synchronize_between_systems:
      primary:
         AudioInterface: syncrhonized
      secondary:
         AudioInterface: unsychronized

to clearly communicate it

bendichter commented 1 year ago

I think it's pretty clear we are forming a map from the unsynchronized timestamps of audio. We don't need to synchronize in any order. We just need to make sure we keep around the unsynchronized timestamps and always use those, even if the synchronized timestamps exist.