Why not just work with discontinuous NeuraLynx data in (micro)seconds without trying to convert to sample indicies?

dustinf1989 commented 2 years ago

Is your feature request related to a problem? Please describe. There is a problem with converting discontinuous Neuralynx data from its native microseconds to sample indices, and a rather complex solution described on this page. In all NeuraLynx data, the timestamps are synchronized between event (nev), continuous (ncs), and spike (ntt) files, no adjustment is needed. Both the events and the data use the same timestamp (in microseconds) system that is constant and is not altered by starting or stopping acquisition.

Describe the solution you'd like It should be possible to work with discontinuous NeuraLynx (or any data that is natively recorded in time units instead of sample indices) in time units (e.g., seconds). Therefore, ft_define_trial() should have an option to input data in seconds and output 'trl' in seconds instead of samples. ft_spike_maketrials() would work more intuitively in seconds without having to define 'TimeStampPerSample'. At the end of the day, when you are plotting or showing your data to anyone, you likely show it in time, not sample indices. Converting (micro)seconds to samples using an error-prone, lengthy process just to convert back to seconds again for plotting seems unnecessary. Perhaps there are further dependencies in the fieldtrip toolbox that only work with samples, but it would be nice if everything also works in time (e.g. seconds).

Describe alternatives you've considered I've edited some functions to work with seconds already and wrote some of my own. I want to use fieldtrip since it has many useful functions for analyzing spike-LFP correlations, but this conversion is very tedious and error-prone.

robertoostenveld commented 2 years ago

Hi @dustinf1989, you are free to do so, but the FieldTrip ft_read_data and ft_preprocessing functions are data format independent and expect continuous data with the sample sampling rate in each channel. See https://www.fieldtriptoolbox.org/development/module/fileio/. I agree that is inconvenient for data represented for example as ncs/ntt, but it is convenient for most of the data types that we care about.

The wrappers that you write around the functions should return the data according to the ft_datatype_raw and the ft_datatype_spike formats to ensure that they can be further processed and plotted with FieldTrip functions.

Please take the code you need (it is under the GPL license two you can reuse it) and adjust those or make your own wrappers around it. A good starting point would be to take the read_neuralynx_xxx functions from fileio/private. Note that some are not actual neuralynx file formats but our own modifications/organization of the data.

dustinf1989 commented 2 years ago

Hi @robertoostenveld , thanks, I know I can edit the functions myself and I have done so to create two new functions, ft_appendspike_sec() and ft_trialfun_general_sec(). These so far work with my specific case, but the pieces of code skipped by if-else statements are unchanged, so they are still not generalized completely. ft_appendspike_sec.txt ft_trialfun_general_sec.txt

How can I identify which other functions will have issues with using data represented in seconds? It seems everything that requires timestampspersecondas an input won't work natively (e.g., ft_spike_maketrials), but I only just started using FieldTrip and it seems like this is deeply ingrained in the toolbox. I have been analyzing exclusively spike train data before, and now I want to incorporate LFP so I would like to use the toolbox you've written, but I'm trying to see how much extra work it will be to adapt everything to work natively in seconds.

In response to "I agree that is inconvenient for data represented for example as ncs/ntt, but it is convenient for most of the data types that we care about." I would also say that converting to time (seconds) is a requirement for all time series data analysis, so why not do this conversion immediately when loading the data? You even mention in this tutorial, "A disadvantage of the second method relative to the first methods is that the spike times are converted to samples, such that we introduce a (minor) distortion of the estimated spike-LFP phases." This disadvantage could be avoided completely just by computing everything in time.

Why is it so important that time is continuous for using fieldtrip? In discontinuous Neuralynx recordings, timestamps (microseconds) are repeated during pauses in the recordings. If you simply multiply timestamps (seconds) by sampling rate (Hz), you can get a continuous time axis and just remove repeated values. Why construct a continuous time axis based on the start and end times and add NaNs to fill in discontinuous periods when the original timestamps are continuous?

robertoostenveld commented 2 years ago

Why is it so important that time is continuous for using fieldtrip?

FieldTrip is primarily a toolbox for MEG, EEG and (human) iEEG analysis. The spike representation is an add-on that is not used by many people (which is also why it is in contrib/spike, not in fieldtrip proper).

You are looking at it from the perspective from spikes, but I primarily look at it from the perspective of continuously sampled data such as MEG, EEG and iEEG. LFPs are still quite similar to iEEG, but with spikes it starts to diverge.

Had we started the toolbox with spike analysis, we might have designed the data structures, the functions, and the (example/tutorial) analysis pipelines differently. I am interested in your suggestions, but think that it would be better to center the discussion around analysis pipelines and data representations. For example, what is wrong (according to you) with the ft_datatype_spike representation?

dustinf1989 commented 2 years ago

@robertoostenveld thank you for considering my suggestions. I would say that as someone new to the toolbox, using ft_datatype_spike is confusing because the unit (sample or seconds) of spike.timestamp is unspecified and it took time to decipher that it is normally assumed to be in samples not seconds, so it would be helpful to have the unit specified and ideally have two fields, one for each. From a data pipeline perspective starting with spiking data, I would say it's important to analyze stimulus-locked activity or cross-correlations between spike trains first, which would be done in the time-domain. After that, I would move to spike-LFP (or spike-iEEG) phase-correlations, which I personally would find to flow most naturally using time instead of samples.

robertoostenveld commented 2 years ago

the unit of spike.timestamp is acquisition system specific and depends on the original data.

For Plexon timestamps are in samples at 40kHz, which is the the highest sampling rate of the system used for the spikes, whereas the LFP is sampled a lower rate. For Neuralynx it is microseconds (rounded off to the nearest integer), corresponding to approximately 31 timestamps per sample at 32kHz. In both cases it is expressed as (long) integers, and not in seconds, which would require it to be a floating point number.

robertoostenveld commented 2 years ago

I have added this to the respective "getting started" pages.

robertoostenveld commented 2 years ago

Looking at the respective pages on the website, I also found this https://www.fieldtriptoolbox.org/getting_started/animal/#synchronizing-with-timestamps as being relevant. In your specific case, I can imagine part of the confusion stemming from you working with Neuralynx data yourself, but the tutorials being based on Plexon data.

schoffelen commented 2 years ago

Does this still need work from our end, or can it be closed? @dustinf1989

fieldtrip / fieldtrip

Why not just work with discontinuous NeuraLynx data in (micro)seconds without trying to convert to sample indicies? #2063