AllenInstitute / AllenSDK

code for reading and processing Allen Institute for Brain Science data
https://allensdk.readthedocs.io/en/latest/
Other
344 stars 150 forks source link

Do not use monitor_delay in timestamps for running speed #2334

Open danielsf opened 2 years ago

danielsf commented 2 years ago

As a result of PR #2256, the StimulusTimestamps data object in allensdk/brain_observatory/behavior/data_objects is always created with a value for monitor_delay. While this is appropriate for timestamps associated with stimuli that are presented on the monitor, it is not appropriate for running speeds. Running speeds are measured directly from the running wheel. The monitor has nothing to do with them. To fix this, we will implement a new timestamps class that knows nothing about monitor_delay and make sure that the data object classes in allensdk/brain_observatory/behavior/data_objects/running_speed/ use that new class, rather than StimulusTimestamps. As a part of this work, any shared functionality between the new class and StimulusTimestamps should be factored out into a timestamps base class from which both inherit.

Tasks

The code defining StimulusTimestamps can be found in allensdk/brain_observatory/behavior/data_objects/timestamps/stimulus_timestamps/

Validation

aamster commented 2 years ago

Just a thought -- would you consider StimulusTimestamps to be a type of BehaviorTimestamps? It doesn't seem to directly involve behavior, since it is just the time at which the stimulus appeared on the monitor (I guess from the perspective of the mouse due to the monitor delay?) I would definitely consider RunningSpeedTimestamps to be a type of BehaviorTimestamps though. If you do consider stimulus timestamps to be behavior timestamps, what is your argument? If not, should the base class instead be called Timestamps? I know you are trying to achieve symmetry with OphysTimestamps by calling the base class BehaviorTimestamps.

danielsf commented 2 years ago

I do think StimulusTimestamps is a type of BehaviorTimestamps. In my mind, if we are presenting a stimulus, it is because we are doing an experiment on the mouse's behavior. The "behavior" in BehaviorTimestamps is more a comment on what we the humans are doing than on what the mouse is doing. I also want to somehow differentiate it from the OphysTimestamps class in allensdk/brain_observatory/behavior/data_objects/timestamps, which, I think, should not be a part of this refactor. If the base class was Timestamps, people are going to wonder why OphysTimestamps doesn't inherit from it.

danielsf commented 2 years ago

I'm open to the name BeahviorSessionTimestamps, if you think that will avoid confusion

aamster commented 2 years ago

Doesn't BeahviorSessionTimestamps run into the same issue you identified above: I don't think we ever just collect data without the behavior part, so wouldn't all timestamps inherit from BehaviorSessionTimestamps?

danielsf commented 2 years ago

Now that we are removing the to_json and from_json API for StimulusTimestamps (see #2357), we may not need to implement this ticket as described. Since each data object that relies on timestamps will have to create those timestamps from_stimulus_file or from_sync_file, we can make sure that those invocations set monitor_delay=0.0 where appropriate.

Additionally, we can add checks to the __init__ of data objects that should have monitor_delay=0.0, to raise an exception of not np.isclose(stimulus_timestamps._monitor_delay, 0.0) like so

class RunningSpeed(DataObject, LimsReadableInterface, NwbReadableInterface,
                   NwbWritableInterface, JsonReadableInterface,
                   JsonWritableInterface):
    """A DataObject which contains properties and methods to load, process,
    and represent running speed data.
    Running speed data is represented as:
    Pandas Dataframe with the following columns:
        "timestamps": Timestamps (in s) for calculated speed values
        "speed": Computed running speed in cm/s
    """

    def __init__(
        self,
        running_speed: pd.DataFrame,
        stimulus_file: Optional[BehaviorStimulusFile] = None,
        sync_file: Optional[SyncFile] = None,
        stimulus_timestamps: Optional[StimulusTimestamps] = None,
        filtered: bool = True
    ):
        super().__init__(name='running_speed', value=running_speed)

        if stimulus_timestamps is not None:
            if not np.isclose(stimulus_timestamps._monitor_delay, 0.0):
                raise RuntimeError(
                    "Running speed timestamps have montior delay "
                    f"{stimulus_timestamps._monitor_delay}; there "
                    "should be no monitor delay applied to the timestamps "
                    "associated with running speed")

        self._stimulus_file = stimulus_file
        self._sync_file = sync_file
        self._stimulus_timestamps = stimulus_timestamps
        self._filtered = filtered
danielsf commented 2 years ago

Here is an additional complication: if you generate a BehaviorOphysExperiment.from_lims, the stimulus timestamps that are read from the sync file get "forgotten" when you go to create the RunningSpeed data object in the BehaviorSession. Specifically:

This block of code constructs the StimulusTimestamps for an ophys experiment from a sync file

https://github.com/AllenInstitute/AllenSDK/blob/vbn_2022_dev/allensdk/brain_observatory/behavior/behavior_ophys_experiment.py#L177-L179

It gets passed to the related BehaviorSession here

https://github.com/AllenInstitute/AllenSDK/blob/vbn_2022_dev/allensdk/brain_observatory/behavior/behavior_ophys_experiment.py#L185-L191

The behavior session will create the necessary RunningSpeed objects here

https://github.com/AllenInstitute/AllenSDK/blob/vbn_2022_dev/allensdk/brain_observatory/behavior/behavior_session.py#L183-L194

Note that the StimulusTimestamps from the sync file are passed along. However, because those stimulus timestamps have a non-zero monitor_delay and we now know that RunningSpeed should not have monitor delay applied, this block of code

https://github.com/AllenInstitute/AllenSDK/blob/vbn_2022_dev/allensdk/brain_observatory/behavior/data_objects/running_speed/running_speed.py#L1[…]37

reconstructs the stimulus timestamps from the stimulus file. The knowledge of the existence of the sync file has been forgotten.

Note that, before #2364 was merged, the StimulusTimestamps created by the BehaviorOphysExperiment were passed all the way through to the RunningSpeed data object. This is to be contrasted with creating a BehaviorOphysExperiment.from_json, in which case, the StimulusTimestamps created from the sync file are not passed through to the BehaviorSession object. BehaviorOphysExperiment.from_json calls BehaviorSession.from_json which can only know about the stimulus file.

See

https://github.com/AllenInstitute/AllenSDK/blob/vbn_2022_dev/allensdk/brain_observatory/behavior/behavior_ophys_experiment.py#L373-L375

https://github.com/AllenInstitute/AllenSDK/blob/master/allensdk/brain_observatory/behavior/behavior_session.py#L113-L119

I think the problem is that we have created from_json and from_lims APIs for objects which are not actually be created from those sources. Metadata, stimulus files, and sync files are being constituted from_json and from_lims. They either require paths to files or data from columns in the LIMS database. RunningSpeed, Licks, Rewards, Timestamps etc. are being constituted from_stimulus_file or from_sync_file. The only purpose of from_lims or from_json is to supply them with the paths to those files. I wonder if the correct thing to do is to strip away from the from_lims and from_json APIs from objects that really just need the path to a stimulus/sync file and then force their parent objects (BehaviorSession, BehaviorOphysExperiment, etc.) to pass the right combination of sync and stimulus files around when creating these constitutent data objects.