Wrong calculation of sampling frequency

mwawra commented 3 years ago

Describe the bug When I import a Neuralynx dataset with multiple single-channel Neuralynx files (internally read by read_neuralynx_ds) there is a correction of the sampling frequency (see line 101-105 in read_neurlynx_ds.m) when a "SubSamplingInterleave" is found in the header. But this seems to be unnecessary since orig(i).hdr.SamplingFrequeny is already the right frequency. Since I only got data to analyze from somebody else and don't have a recording system I can't verify this further.

To Reproduce Steps to reproduce the behavior:

open Matlab and load fieldtrip

run:

cfg         = [];
cfg.dataset = '/path/to/neuralynx data set/';
data        = ft_preprocessing(cfg);

Check data.fsample

Environment and versions (please complete the following information):

Operating System: Windows 10
MATLAB version R2020b
FieldTrip version 20210602

schoffelen commented 3 years ago

Uhm, it's not clear what the problem is. Do you get a wrong data.fsample? Also, of course it's good to provide a snippet of code, but in absence of any data it is not possible for us to reproduce it (whatever this 'it' is, see above).

mwawra commented 3 years ago

Sorry, since I don't have a recording system I can't generate example data. But yes, I got the wrong data.fsample. It is divided by "SubSamplingInterleave", what is unnecessary, because this was already done by the recording software.

robertoostenveld commented 3 years ago

Can you share some data (preferably a small file) and possibly the documentation on the file format? See https://www.fieldtriptoolbox.org/faq/how_should_i_send_example_data_to_the_developers/

The only examples of this data that we now have are from 2005, and specify a SubSamplingInterleave of 1.

I do have some email correspondence from 2008 with a former colleague that relates to this. I would have to check the code history to see whether and how that was resolved back then.

mwawra commented 3 years ago

I tried to record some small sample files in demo mode of Neuralynx Cheetah 5.6.3, 5.7.4 and 6.3.2 (only versions available online). The thing is, that this versions apparently don't set the SubSamplingInterleave in the header anymore, even if I record test data with a SubSamplingInterleave higher then 1. The data I have was recorded with Cheetah 5.1.0 back in 2008. Also in the documentation on the file format there is no detailed description of what is in the header.

robertoostenveld commented 3 years ago

I searched my computer for references and found this 2008 message from (back then my colleague) Thilo

------------------------------------------------------------
1) generally the "online" analysis seem to work (if you are interested recent functions are on coherence/thiwom/matlab/onlineAnalysis), "but" the resampling of the data in matlab is very slow, so that I got lynx channels with the lower sampling frequency (interleaved samling only every 32 sample). This causes the following problem:

---problem: sampling frequency of Cheetah-written ncs is not read-out correctly when data is sampled at a rate different than 32556 (with SubSamplingInterleave specified to be >1)
when reading an ncs file sampled from the Cheetah software at e.g. 1000Hz, then

---location(s): read_neuralynx_ds.m (and: read_neuralynx_ncs / neuralynx_getheader)
line 106---    SamplingFrequency(i) = orig(i).hdr.SamplingFrequency
... SamplingFrequency(i) is 32556 even if software sampled at (32556 / 32) Hz and First / LastTimesStamp suggests the correct 1000Hz Fs

---potential solutions:
a- make use in read_neuralynx_ds.m of the header variable
orig(1).hdr.SubSamplingInterleave (e.g. line 106 SamplingFrequency(i) = orig(i).hdr.SamplingFrequency / orig(i).hdr.SubSamplingInterleave;
); if that is >1 then Fs needs to be corrected. [Note that the division cause the following warning: Warning: Integer operands are required for colon operator when used as index > In fieldtrip/private/read_neuralynx_ds at 198].
Or: 'only' consider diff(FirstTimestamp, LastTimestamp) / nSamples
------------------------------------------------------------

So according to Thilo back then the SubSamplingInterleave was to be used as a divisor to correct the SamplingFrequency. Could you contact Neuralynx support and get a confirmation that - when SubSamplingInterleave is specified - it should be ignored since the SamplingFrequency field is already updated? And, if that is indeed the case, to which (historical) versions of the file format the one or the other implementation should apply?

To me it sounds as if Neuralynx started with SamplingFrequency, at a certain point added SubSamplingInterleave, then changed the implementation so that SamplingFrequency was divided and SubSamplingInterleave not really needed any more, and then only later figured out that SubSamplingInterleave was conflicting and hence removed it from later versions of their software.

mwawra commented 3 years ago

@robertoostenveld, thank you for your interest in that! I will contact Neuralynx and come back when they answered me.

mwawra commented 3 years ago

Hello @robertoostenveld, I got some information from Neuralynx. They wrote me the following: "The header already has the calculated sample rate and takes into account the subsampling. There is no need to further divide the sampling by the subsample value." Also I found some short testing recordings in the old dataset. I loaded 2 example files from that recording to github and invited you to this repository. Interestingly, it also makes a difference if you just load one file or every file in a path: If you have the following structure: /path/ ... CSC01.ncs ... CSC02.ncs

cfg         = [];
cfg.dataset = '/path/'; % <- wrong sampling frequency
data        = ft_preprocessing(cfg);

vs.

cfg         = [];
cfg.dataset = '/path/CSC01.ncs'; % <- correct sampling frequency
data        = ft_preprocessing(cfg);

robertoostenveld commented 3 years ago

thanks, I downloaded the files.

The comment of Neuralynx is to be taken as conclusive on this matter, since it is their format.

The reason for the single-file versus directory-with-multiple-files behaving different is that the SubsamplingInterleave is somehow only implemented in https://github.com/fieldtrip/fieldtrip/blob/463ffad122c0e27bd6b7d2bb5889319d4845c4ce/fileio/private/read_neuralynx_ds.m#L101 and not in the read_neuralynx_ncs implementation that it calls. It would have been more logical if read_neuralynx_ncs would have done it, that is also the function called when you specify a single *.ncs file as dataset.

I will remove it from read_neuralynx_ds.m, add a warning there (only to be given once), and add a test script based on your data

mwawra commented 3 years ago

thanks, I downloaded the files.

The comment of Neuralynx is to be taken as conclusive on this matter, since it is their format.

I totally agree, but I'm also still wondering why your former colleague read out a wrong frequency during the "online" analysis and had to correct it.

The reason for the single-file versus directory-with-multiple-files behaving different is that the SubsamplingInterleave is somehow only implemented in

https://github.com/fieldtrip/fieldtrip/blob/463ffad122c0e27bd6b7d2bb5889319d4845c4ce/fileio/private/read_neuralynx_ds.m#L101 and not in the read_neuralynx_ncs implementation that it calls. It would have been more logical if read_neuralynx_ncs would have done it, that is also the function called when you specify a single *.ncs file as dataset.

I will remove it from read_neuralynx_ds.m, add a warning there (only to be given once), and add a test script based on your data

Sounds like a good solution. Thanks again for your support here!

fieldtrip / fieldtrip

Wrong calculation of sampling frequency #1780