Write out events that happen in maintenance mode to separate files?

jkbhagatio commented 1 year ago

For ease of analysis of both "real experiment" and "troubleshooting" data, it could be nice to have something like this in place, where events that happen in maintenance mode are easily discriminable from events that happen during "real experiment" mode.

There are potentially many ways to do this, and don't have a good sense right now for how complicated this would be, but worth at least a discussion I think.

glopesdev commented 1 year ago

@jkbhagatio This is already easy to do since we have the exact range of maintenance mode times.

Splitting into different files raises complicated questions, such as "which files?". Would it be required for us to cutoff video files and all other streams into separate "maintenance" data?

It also raises the question about why would maintenance data be more important than other environment "states"? I can also easily imagine wanting to have an easy way to split data from only the "easy" period of the experiments, versus only the "uncertain" period, or only when there are actually subjects in the arena, etc.

Fortunately this is already all built in to the way data is organised by time. Given that all data is synchronised in time it is trivial to split datasets post-hoc by time ranges. In fact, this is how we have always done to split data into visits, epochs, bouts, etc.

It could be useful to make an auxiliary method just to exclude data falling between specific events (we already have one to get all data within pairs of events), but I would really avoid changing the structure of data logging at all.

jkbhagatio commented 1 year ago

@glopesdev

It could be useful to make an auxiliary method just to exclude data falling between specific events

Yes, this could be one potential solution.

An issue right now can be seen in the following example: Imagine I want to analyze data from many devices over a 1 week period, with many times in that one week period I want to ignore (e.g. maintenance periods, or "task periods" that are irrelevant to my current analysis). Right now, I think there are two obvious ways to get the data I care about during this 1 week period: 1) Load all data from this period, and then filter out the data I don't care about based on the timestamps of some metadata info (like maintenance periods, "task periods", etc.). This is inefficient. 2) First find all the subperiods of time of data I don't want to analyze, use this to find the subperiods of time of data I do want to analyze, and use this array of subperiods to call api.load() many times (one for each continuous subperiod I do want to analyze), and then concatenate all these subperiods of data into one dataframe, per device. Imo this is too much work for the user and there should be a simpler solution.

An amendment to api.load() and/or a helper function would solve this issue.

glopesdev commented 1 year ago

Load all data from this period, and then filter out the data I don't care about based on the timestamps of some metadata info (like maintenance periods, "task periods", etc.). This is inefficient.

Actually, I don't think this is that inefficient. The rationale is that for any given period of interest (e.g. a visit), we would indeed be interested in loading all the data, had there not been maintenance periods in between. The loading of the chunks is often the real bottleneck, not the filtering afterwards once they are in memory, especially given that maintenance period events are still very sparse events.

I was thinking of simply a helper that would take pairs of "undesired" periods and a data frame of interest and generate a "mask" for the pandas frame that cuts out the invalid chunks.

I actually don't think you even want to do this necessarily to all the data, but mostly to sparse event streams like pellet deliveries, threshold crossings, etc, which are all tiny compared to 500Hz wheel data for example.

Once you have the sparse events of interest resolved, you can query the continuous data safely and without processing, as we would in the normal case.

jkbhagatio commented 1 year ago

The rationale is that for any given period of interest (e.g. a visit), we would indeed be interested in loading all the data, had there not been maintenance periods in between

Sure I think this is fine for short maintenance periods, but I was thinking of maintenance periods that last multiple chunks.

glopesdev commented 1 year ago

@jkbhagatio do we still need this? I still stand by all the above points, and believe this issue would be better addressed by improved helper functions in aeon_mecha. Let's discuss tomorrow, if this makes sense I would transfer this issue to aeon_mecha.

glopesdev commented 11 months ago

@jkbhagatio closing this for now, feel free to reopen if needed or transfer to aeon_mecha.

jkbhagatio commented 11 months ago

Hey, reopening this. I'll get to it. Can't transfer an issue from a private to a public repo

SainsburyWellcomeCentre / aeon_experiments

Write out events that happen in maintenance mode to separate files? #181