Open nikhilchandra opened 2 months ago
Sounds cool to me. I'm sure @h-mayorquin is also interested.
That sounds great. Concerning the question of what to do when asking for out of bound data. That is not clear to me, is this something that you have made a made a decision about @zm711 ?
Hi. On spikeinterface side we decided to raise an error. On neo side this have never been clear. I think the most often case is to clip not to zero pad.
I'm looking right now and a bunch of the memmap ones just do slicing [start:stop, :]
so that would just follow whatever the numpy memmap convention allows, no. Where is the clipping machinery?
@samuelgarcia Please, could you clarify what you mean by "clipping"?
Hi sorry just reading this.
I am a bit fuzzy.
cliping was for : if stop in [start:stop, :]
si greater than the shape[0] then stop is is "clip" to shape[0].
In short very often neo behave like numpy for upper boudary.
but spikeinterface do not, it is more strick.
@h-mayorquin It looks like plexon2rawio.py downloads the relevant DLLs from this location:
https://raw.githubusercontent.com/Neuralensemble/pypl2/master/bin/{file_name}
where "file_name" can take on the value PL2FileReader.dll or PL2FileReader64.dll. How does one go about submitting updated DLLs to this location?
Another problem -- the functions _segment_t_start and _segment_t_stop take block_index and seg_index as arguments. However, for PL2 files the precise start/stop times for a given segment depend on the stream. For example, here are segment start times for 3 wideband and 3 field potential channels from the same recording.
WB001 segment timestamps: [ 67 3609805 7217229 10824175 14430721 18036311 21642619] WB002 segment timestamps: [ 67 3609805 7217229 10824175 14430721 18036311 21642619] WB003 segment timestamps: [ 67 3609805 7217229 10824175 14430721 18036311 21642619]
FP001 segment timestamps: [ 67 3609827 7217267 10824187 14430747 18036347 21642627] FP002 segment timestamps: [ 67 3609827 7217267 10824187 14430747 18036347 21642627] FP003 segment timestamps: [ 67 3609827 7217267 10824187 14430747 18036347 21642627]
Note that within streams the values are identical, but across streams they differ slightly. I am not sure what the correct approach is to incorporate these observations into the existing framework.
Note: all values are in units of ticks, not seconds.
This is because the data is organized in packets?
How could the time stamps of the Wideband and field potential be different if it is the same data just filtered differently? Unless the headstage or equipment applies a slight delay to do the onboard filtering and then sends that stream with a slight delay?
WB and FP sampling rates are 40 kHz and 1 kHz, respectively. Each recorded sample is assigned an integer value (tick) that corresponds to a clock running at 40 kHz. Therefore, successive samples recorded for WB have tick values that increment by 1, while successive samples recorded for FP have tick values that increment by 40. Each segment timestamp in my previous message equals the tick value of the first sample recorded for that segment. Here are some predictions based on the preceding statements:
(1) For a given segment, the first recorded samples for WB vs FP channels should have tick values that differ by a maximum of 40 (2) For FP, tick values for all recorded samples must differ by a multiple of 40.
The attached calculations from an Excel spreadsheet show that both predictions work out.
In the row "WB-FP", we see that the difference in tick values for FP and WB segments never exceeds 40.
In the row "FP(n) - FP(0) % 40", we see that the difference in tick values for each FP segment compared with the first (67) is always a multiple of 40 .
I suspect that the correct course of action depends on exactly how _segment_t_start and _segment_t_end are used, will keep digging and make recommendations based on what I find.
Could someone please provide insight as to how to incorporate new 32- and 64-bit DLLs into the project (see my previous message from last week)? Thank you! @zm711 @h-mayorquin
Either @h-mayorquin or @JuliaSprenger set this up originally. So to be honest I don't know. Hopefully Heberto can comment on how he set it up. I only helped with the shell commands for it :)
HI, the DLLs home is here:
https://github.com/NeuralEnsemble/pypl2
I guess the route would be to make a PR to this but I don't know the ownership situation of that repo
@samuelgarcia @apdavison?
Maybe I or @nikhilchandra could take ownership over it now that Julia is sadly not as present as she used to be.
Just checked and I definitely don't have any writing permission there. We likely need to get Julia to share with a couple people so we don't lose access to it !
I've given the Neo maintainers plus @h-mayorquin access to the pypl2 repo
@nikhilchandra feel free to make a PR there.
@samuelgarcia and @apdavison could you comment on the timing question as well.
@h-mayorquin workin' on it :)
@samuelgarcia comment if I'm misquoting you please, but I asked about your other question Nikhil and the idea was the t_start is just the minimum of the segment since there can be staggered start times for the exact reasons you mention but the actual segment starts whenever the first stream of data starts. Again Sam explained this to me in the meeting, but now I'm writing this after a few days, so it might not be completely clear.
@zm711 @samuelgarcia After digging through the code some more my understanding is that _segment_t_start and _segment_t_end are stream-independent, while _get_signal_t_start is for streams.
The pypl2lib library includes a stream-independent function called pl2_get_start_stop_channel_data() -- this function is independent of stream, so I have predicated _segment_start and _segment_t_end on that. Meanwhile, _get_signal_t_start does depend on stream, and the relevant information can be gathered from the "fragment" variables returned by pypl2lib's get_analog_channel_data().
Yes the idea is that a segment can contain multiple streams of data so the overall start of the segment is stream independent because it is whenever the first stream starts within that segment. The signals are the streams and so those rely on the actual stream being assessed.
Any place where we could add the documentation of the meaning of these methods? I remember being confused about it as well at some point.
You want to tack your comment to #1508. That is the reminder to make our rawio docs better :)
There is some nuance to this problem that I did not fully appreciate earlier. I need to decide what to do -- some input would be appreciated @zm711 @h-mayorquin @samuelgarcia @alejoe91
The OmniPlex system supports trial-based recordings, which means you can pause/resume recordings as needed over the course of an experiment. While SpikeInterface deals with "segments," the PL2 format uses "fragments." Unfortunately, the concept of PL2 fragments does not map perfectly to SpikeInterface segments, as I will show below.
When you start, stop, pause, or resume a recording, this creates a timestamped event. Information about such events can be retrieved with 2 functions in the PL2FileReader API -- PL2_GetStartStopChannelInfo and PL2_GetStartStopChannelData. Note -- these events are not tied to any stream.
When retrieving analog channel data (what SpikeInterface would refer to as a "stream"), one has the option of using the API's function PL2_GetAnalogChannelData. This returns 5 variables:
For a recording with multiple segments, the function returns all data points for all fragments in a single, contiguous array. We need to use fragment_timestamps and fragment_counts to make sense of things.
HOWEVER
Suppose we conduct a recording in one continuous block, i.e., we do not pause/resume the recording at any time. This means that # of segments == 1. Such a recording can still have multiple fragments if the OmniPlex hardware gets overloaded so there are gaps in the data that actually gets recorded. In effect, there is a mismatch between # of segments and # of fragments. The gap need not be very big. I am currently playing around with a file in which the following values are returned for a supposedly continuous recording:
num_fragments_returned = 2 fragment_timestamps = [61, 8200645] fragment_counts[8196286, 41286384]
The first fragment occupies the tick-interval [61, 8196347) The second fragment occupies the tick-interval [8200645, 49487029)
So the gap between the first and second fragments is 4,298 samples. Given the 40 kHz sampling rate, this translates to roughly 107 ms of missing data.
PROPOSED SOLUTION
I think we should use the information from PL2_GetStartStopChannelInfo and PL2_GetStartStopChannelData as the ground truth -- if this information indicates that there is more than 1 segment, then reject the file for processing.
At the same time, if the number of fragments returned exceeds 1 due to one or more gaps in the data, we should do the following:
Questions:
I will read your comment soon. Today I'm swamped but I will try for the next couple days :)
Same here, this is on my radar but this week is super busy.
@nikhilchandra I totally agree with, the segments concept in reader should be revised it fuuzy used for handling gaps (like in neurlynx) and handling wanted pause during recording.
I would propose a call to discuss that with @zm711 @h-mayorquin and maybe @alejoe91.
I think we should have a clear gap handling API directly in neo.rawio and the user could decide the tolerance of the gaps and optionaly could retrieve the timestamps of samples. This is a massive task.
@samuelgarcia Agreed
@zm711 @h-mayorquin @alejoe91 Let me know if you'd like to do a call sometime
@samuelgarcia I am not sure if I should just proceed with my changes or if some discussion is necessary first, based on preceding comments. Please advise.
Sorry for the delay on this. We all have been very busy. We will discuss this this week, would you like to join? I will send an invite by email.
I have responded to your email. See you soon!
OK, we discussed this today. Sam proposed a general way of thinking about this but I am leaving here the relevant context to move this forward:
Semantic distinction between segments and gaps: Segments are international gaps are not.
In the ideal case we have an automatic way of telling this. Intentional jumps in time become segments non-intentional ones becomes gaps. Gaps should not be segment and we should create an API to collect them.
Nikhil action items:
I will open another issue on python-neo to discuss the API thing.
Is your feature request related to a problem? Please describe. The latest version of the PL2FileReader API includes a function capable of retrieving subsets of data from an analog channel. Switching _get_analogsignal_chunks to this function should (hopefully) remove the need to load entire channels into self._analogsignal_cache (this takes up a LOT of memory) and might also speed things up.
Describe the solution you'd like I want to do the following:
Describe alternatives you've considered There are none that I can think of.
Related Question What is the proper way to handle requests that exceed the limits of a recording's segments? For example, suppose I have a multi-trial recording in which 2 recording segments run from [0, 10] and [50, 60] (units in seconds). Suppose the user requests data from [5, 15] seconds -- obviously we have data from [5, 10] from the first segment. But for the interval [10, 15] -- do we just return zeros?