Closed jthermiz closed 3 years ago
I think found an alternative resampling function doesn't produce this error. The spikeinterface
resampling method (underlying implementation is scipy.signal.resample
) did not crash for RVG16_B06
. The output is saved to /clusterfs/NSDS_data/nwb/RVG16/RVG16_B06_resampled.py
. I implemented this approach on my fork in the no_resample
branch: https://github.com/jthermiz/nsds_lab_to_nwb/tree/no_resample. However, I'm not sure if the resampling method outputs desirable results. I believe they do but we should check.
@izzy-baldacci Can you please compare these resampling functions process_nwb.resample.resample
vs spiketoolkit.preprocessing.resample
(and perhaps scipy.signal.resample
) packages. Using a small snippet of data, do you see any difference between the signals? If so, can you characterize the differences by plotting the residuals over time and as a distribution. It may also be informative to plot the power spectral density of these signals.
Resampling to the nearest kHz takes forever! To make our lives easier, I strongly advocate we only resample once during the analysis phase (eg. from 12.4kHz to 3kHz). @KrisBouchard What would it take for your to be convinced that fractional sample rates are not evil?
Finally scipy.signal.resample
by default uses a time domain approach, but there is also FFT approach. I believe that the process_nwb.resample
was adapted from the FFT approach. @JesseLivezey Is that right? And if so, is there a reason you favored the FFT based approach?
One last thought is that if we end up using an FFT based approach, does it make sense to directly resample in the frequency domain while doing the wavelet transform (ie. only transform a subset of frequency)? That away we only convert to frequency space once as opposed to ping ponging back and forth between the time and frequency domains, which is inefficient. As we introduce neuropixels, we will have potentially 10x more data. So to the extent we can make the pipeline scalable, the easier our lives will be :)
@jthermiz I think that's a new error :(. Which block is this on?
These problems seem to happen for the mkl fft library. A few years ago the mkl library was sometimes 10-100x faster than the default numpy or scipy routines. It might make sense to do a little benchmark and just use numpy or scipy if they don't have as many problems. I can run a few tests. It looks like scipy has some parallelization. which could also help.
The resample will be by far the slowest operation for NWB conversion with the current pipeline. I'd also advocate for not resampling, if that's acceptable. We'd eventually have to resample, but it would be during the wavelet steps, which are also slow.
@jthermiz I think the scipy.signal.resample
docs are a little confusing. It does frequency domain resampling. The domain='time'
argument specifies whether the input is in the time domain or if it has already had a fft applied to it. See here
https://github.com/scipy/scipy/blob/44e31b1eb202ef91caad059952ba268d2610a60b/scipy/signal/signaltools.py#L3028
Doing time-domain resampling is extremely slow for signals of this length.
process_nwb.resample
does the same basic operations as scipy.signal.resample
, but we do padding before resampling. If you don't pad the signals, you can get some pretty big artifacts from the frequency domain resample near the edges of the resampled signal, which is where baselines are often computed from. It might also be possible to zero pad using the scipy routines without making a copy of the array. Needs some testing.
@JesseLivezey I ran into this error when trying to convert this block: RVG16_B06
Re: time vs frequency domain: That makes sense.
Re padding: Depending on how long the artifact lasts for and alternative is to throw away samples in the beginning and end of the recording. It usually takes us 5-10 sec to press play on dvd player (the play button on the dvd player is not so responsive these days...). If the artifact last for < 1 sec, I'd vote for karate chopping off dt (where dt <= 1 sec) at the beginning and end of the data. :martial_arts_uniform:
I'm going to switch process_nwb
to using scipy's fft for now. It doesn't have a problem with that block and it's reasonably fast. I'll cut a new pip version so the default install method works.
pip uninstall process-nwb
pip install process-nwb
should fix this.
I was able to resample a full tone ECoG (128 channels @ 12.xxx kHz) down to 4kHz on catscan.
I believe this is a known issue in
process_nwb
, but I get this when trying to run the pipeline with the resample flag set toTrue