catalystneuro / neuroconv

Create NWB files by converting and combining neural data in proprietary formats and adding essential metadata.
https://neuroconv.readthedocs.io
BSD 3-Clause "New" or "Revised" License
51 stars 23 forks source link

[Discussion] semantics of `set_aligned_starting_time` #608

Open h-mayorquin opened 1 year ago

h-mayorquin commented 1 year ago

This probably should have come up before but it escaped me back then. The current function for set_aligned_starting_time shifts the timestamps: https://github.com/catalystneuro/neuroconv/blob/a239c27a07e49544acf0acea63906bd8abb0669c/src/neuroconv/basetemporalalignmentinterface.py#L58-L69

That is, if your timestamps are currently [1, 2, 3, 4] and the starting_time is 3 then the resulting timestamps of the interface will be [4, 5, 6, 7]. We have shifted everything, in fact, that's how we call the test:

https://github.com/catalystneuro/neuroconv/blob/a239c27a07e49544acf0acea63906bd8abb0669c/src/neuroconv/tools/testing/data_interface_mixins.py#L175-L185

But the way that I am feeling now when working with this as a user is that after using set_aligned_starting_time the resulting timestamps should start from the set starting_time. That is, I would have expected the result above to be [3, 4, 5, 6]. That way, when the interface write the timestamps, the starting time will be in fact 3 and not 4 as it is currently.

It appears to me that the current method is shifting and not setting. What do you guys think?

CodyCBakerPhD commented 1 year ago

It appears to me that the current method is shifting and not setting

It's setting a shift factor - if no timestamps present, this defines the start time - if timestamps, those timestamps are unaligned or else why would you be using this method, therefore we shift by the constant starting time to become aligned

h-mayorquin commented 1 year ago

It's setting a shift factor

Yeah, that's the problem to me, I don't think on the starting_time as a shift factor. I want the semantics of the starting_time in neuroconv to be the be the starting_time that the corresponding nwb object will have. set_aligned_starting_time should fulfill that promise instead of shifting.

if timestamps are unaligned or else why would you be using this method

Gross alignment vs fine alignment. The timestamps are fine I just want to set the correct start time because a video recording the behavior, for example, did not start until 20 seconds before a trial period and I want to move them there.

Right now, I will need to extract the timestamps, modify them and use the set_aligned_timestamps which adds complications. What's the gain of this?

CodyCBakerPhD commented 1 year ago

Gross alignment vs fine alignment.

There is no such separation in NWB

There is only the vector of timestamps

Right now, I will need to extract the timestamps, modify them and use the set_aligned_timestamps which adds complications. What's the gain of this?

if you need to extract it anyway, just perform whatever operations you need to when you set the full timestamps - no need to interact with any other method.

The automated shifting is for interfaces that automatically fetch their vector, which may or may not be regular a priori, and determination of which convention (start time + rate or timestamps) occurs later on

h-mayorquin commented 1 year ago

There is no such separation in NWB

There is only the vector of timestamps

I don't understand your argument here. There is a starting_time as well. I am just describing two types of problem that I am facing when working in a conversion: Setting timestamps with a TTL (fine alignment) vs moving interfaces in time with respect to each to other (gross alinngment) if you have better terms for this timestamp related operation I am fine with whatever you might want to come up with.

if you need to extract it anyway, just perform whatever operations you need to when you set the full timestamps - no need to interact with any other method.

I don't need to extract it, I want to set the starting_time correctly. Extracting timestamps is a costly operation in some interfaces as well.

Software is about trade-offs. I gave you two positive points about the proposal. Could you give some possible cost of the proposal or the benefit of the status quo (i.e. making the method shift)?

CodyCBakerPhD commented 1 year ago

Setting timestamps with a TTL (fine alignment) vs moving interfaces in time with respect to each to other (gross alinngment)

Setting what you call 'fine alignment' via the TTL should already be in the common clock, so why would you need to shift it by anything else?

Maybe we're missing context here...

Could you fully describe your current alignment setup and we can analyze if you're utilizing the methods correctly, and if so where the confusion of method names comes into play?

h-mayorquin commented 1 year ago

Let me try this differently without the fine alingment vs gross alingment distinction.

I am saying that maybe we should modify the semantics of the function so this assertion passes (currently it does not):

from neuroconv.tools.testing.mock_interfaces import MockRecordingInterface
from pynwb.testing.mock.file import mock_NWBFile

recording_interface_1 = MockRecordingInterface(num_channels=4, sampling_frequency=30_000.0, durations=[1])
recording_interface_2 = MockRecordingInterface(num_channels=4, sampling_frequency=30_000.0, durations=[1])

timestamps2 = 10 + recording_interface_1.get_timestamps()
recording_interface_2.recording_extractor.set_times(times=timestamps2, with_warning=False)

starting_time_of_recording_2 = 5.0
recording_interface_2.set_aligned_segment_starting_times(aligned_segment_starting_times=[starting_time_of_recording_2])

nwbfile = mock_NWBFile()

metdata = recording_interface_2.get_metadata()

recording_interface_2.add_to_nwbfile(nwbfile=nwbfile, metadata=metdata)
nwbfile.acquisition["ElectricalSeries"].starting_time

assert starting_time_of_recording_2 == nwbfile.acquisition["ElectricalSeries"].starting_time

The proposal is: The staring time set should be the starting time of the object added

And I am asking you is:

CodyCBakerPhD commented 1 year ago

Thanks for providing at least one example to work with

However, the call pattern in the example is exactly what worries be because I see it as a misapplication of the methods

More complicated than it needs to be

If you take a look at

timestamps2 = 10 + recording_interface_1.get_timestamps()
recording_interface_2.recording_extractor.set_times(times=timestamps2, with_warning=False)

starting_time_of_recording_2 = 5.0
recording_interface_2.set_aligned_segment_starting_times(aligned_segment_starting_times=[starting_time_of_recording_2])

why did you not add your starting_time_of_recording_2 to timestamps2 to being with? (and why did you add 10 anyway?) And why did you not call set_aligned_timestamps (or set_aligned_segment_timestamps if you prefer)

I'd say the current example is more complicated than it needs to be to achieve the desired result, but that's the next point.

Expectations vs. Reality

Thanks for providing the assertion you're going for - and this related to what I see as the bigger problem, though you haven't brought it up yet so I'm trying to guide you to it - there is absolutely no guarantee from any of the details in the example above (aside from implicit knowledge about the Mock design but lets abstract away to a real world recording interface) that you will even end up with a starting_time+rate instead of timestamps to begin with based on the call patterns of these methods

Though even if we restrict to the overly complicated example, and make the hard assumption that all timestamps have been regular, I would say that the expected starting_time of the ElectricalSeries ought to be 10.0 + 5.0 + recording_interface_1.get_timestamps()[0] - and I'm wondering why, conceptually, you would expect it to be 5.0? Simply because you said set_aligned_segment_starting_time = 5.0? I can see how this causes confusion, however, that's the next point

Education

Hardest thing about temporal alignment is educating the usage of it, regardless of what the methods are named. Did the docs fail you in some way? Can they be improved to communicate this expected usage better?

Generality

One reason I am so reluctant to revisit a complete backcompatability break just to rename one of these methods is because they apply to all 40+ interfaces we support, and in several of them they may or may not have the exact behavior you expected here - meaning it's not a problem

Currently your issue is with recording interfaces (along with their very specific additional 'segment' related methods), can you list out all the others where you'd see this occur?

What is the symptom of the larger problem? Do you know what it is and how to fix it?

CodyCBakerPhD commented 1 year ago

Also

Real world example

We try to limit our design and application of the temporal alignment methods to relate the most to real world setups instead of contriving all variety of edge cases ourselves

Is your example inspired in any way by a recent conversion? Have you read about the setup anywhere? What are all the various time shifts related to (parsed by TTLs? What is the TTL setup? etc.)

No reason to do extra engineering for a case unless it is actually needed in the real world

CodyCBakerPhD commented 1 year ago

From discussion: much internal confusion has been alleviated but passed up to how we represent starting_time and timestamps[0] in the NWB sense when we have a situation where format-specific timestamps are pre-aligned to an unexpected (and potentially untracked) time basis

Real life example:

first timestamp of FicTrac is actually 1/sampling_frequency; for simple example, let's say sampling_frequency = 30.0 in which case fictrac_timestamps = 0.033... which is relative to the start of the camera device

there are two cases of TTLs that could be sent to align to a common clock from the camera; a TTL sent when camera turns on (in which case our alignment by a starting time method works entirely as expected and intuitive); or a TTL sent on first frame capture, which is the more common case observed so far in practice

But in that second case our alignment approach becomes fairly confusing because we would need to correct for the amount of pre-shift of the format specific timestamps relative to their device starting time

A better approach might be to coerce starting times to be relative to the first piece of data captured, such that all data streams in NeuroConv interfaces reliably start from zero in their own time basis and therefore simplify the act of aligning different streams to a single clock, which would directly lead to a consistent understanding of the semantic behavior as requested above

Before proceeding however, I think we should take stock of what other cases we know of format-specific timestamps that do not start from zero and if the coercion might cause any problems as far as we can foresee

h-mayorquin commented 1 year ago

Thanks for the summary of our discussion @CodyCBakerPhD.

bendichter commented 1 year ago

So if I read this correctly, it sounds like @h-mayorquin you are noting that set_aligned_starting_time does not really set to that value, but in fact shifts all time by that value, right? I would agree that this is confusing- if I were an external user and saw "set" then I would likely assume I am truly setting to that value. @CodyCBakerPhD, I think you are mostly making the point that shifting is the operation that is most needed in real-world applications, rather than truly setting the start time. What if we keep the function as-is, but rename it to something like .shift_time() and smoothly deprecate .set_aligned_starting_time()? Then the functionality would remain the same and the name would be less confusing. What do you guys think of that?

h-mayorquin commented 1 year ago

That is correct. My opinion is that a method named set_aligned_starting_time should set the starting_time of the dataset or dynamic table object (in this case the smallest start_time).

Having two methods and making the current set_starting_time to just callshift_times to keep the current behavior good solution to me. When we find an interface where the current method does not work to set the starting time, then we can change set_aligned_starting_time.

That said, I think that @CodyCBakerPhD did present to me some complexities that made me decrease my confidence on my take. Sadly, I am falling to recall those right now.

CodyCBakerPhD commented 2 months ago

@h-mayorquin I think that is semantically fine to adjust all set_aligned_starting_time to be strictly setter operations, not with in-place modifications

h-mayorquin commented 2 months ago

Thanks for looking into this again. I need to re-read it.

h-mayorquin commented 1 month ago

OK. That make sense, I will do this.