BouchardLab / process_nwb

Functions for preprocessing timeseries data stored in the NWB format
https://process-nwb.readthedocs.io/en/latest/
4 stars 6 forks source link

Compute datasets without storing in NWB (e.g. downsample before and after) #37

Closed VBaratham closed 2 years ago

VBaratham commented 3 years ago

Currently this module performs the entire downsample operation before doing anything else (CAR, wavelet) which is problematic if you want to compute wavelet amplitudes for frequencies above 1/2 the downsample frequency. Our auditory analyses operate on frequency bands up to 1200 hz (center freq), so we typically downsample first to 3200 Hz, perform CAR and wavelet transform at 3200hz, then downsample again to 400 hz before saving the frequency decomposition.

Right now we have apply_* functions which perform one processing step on a numpy array, then we have store_* functions which call apply_*(), create an NWB dataset, and write it into the NWB file. I think all we need is another function that creates and returns the dataset without putting it into the NWB file. Then the process_folder script will call those functions instead of store_*(), and save whichever datasets are needed itself. Process_folder will also take --first-resample and --final-resample as args. Also need an option to specify which datasets to store (or which not to store?)

JesseLivezey commented 3 years ago

@VBaratham, this seems like 2 issues. 1 is that we typically want to downsample the final amplitudes and 2 is that we should have more flexibility with what information gets saved.

VBaratham commented 3 years ago

Yeah, it is two separate issues. I think 1 sort of depends on 2 b/c it's not so helpful to downsample the final amplitudes if you still have to store all the intermediate datasets. In any case I'm hoping to implement them together.

On Fri, Mar 5, 2021 at 5:49 PM Jesse Livezey notifications@github.com wrote:

@VBaratham https://github.com/VBaratham, this seems like 2 issues. 1 is that we typically want to downsample the final amplitudes and 2 is that we should have more flexibility with what information gets saved.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BouchardLab/process_nwb/issues/37#issuecomment-791835599, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHH3TAO3LI3DHYNEOHCDFTTCGCZPANCNFSM4YWDJLRA .

JesseLivezey commented 3 years ago

@VBaratham one problem with not writing (some of) the intermediate results, is that it will require things to be stored in memory at certain intervals during preprocessing. For instance, if the wavelet amplitudes are not stored, then they will have to be stored in memory to be resampled before being stored.

I suppose that we could create a joint pipeline that does multiple steps inside of an iterative write.

JesseLivezey commented 3 years ago

Which is only a problem for datasets that might not fit in memory.

JesseLivezey commented 3 years ago

3 potential pipelines are

  1. Store all (used if someone is looking at data at different points in pipeline). Individual ops can be iterative, if needed.

    • resample to first_resample (store) (maybe initial_resample_rate)
    • CAR and linenoise notch (store)
    • wavelet amplitudes (store)
    • resample amplitudes to final_resample (store) (maybe final_resample_rate)
  2. Store final (simple for datasets that fit in memory). Read from NWB, apply in numpy, write to NWB

    • apply resample to first_resample
    • apply CAR and linenoise notch
    • apply wavelet amplitudes
    • apply resample amplitudes to final_resample (store)
  3. Store final (iterative write, for datasets that do not fit in memory). Read from NWB as needed to compute values.

    • iteratively read and apply resample to first_resample. Downsampled data must all fit in memory.
    • apply CAR and linenoise notch
    • iteratively apply wavelet amplitudes and resample (store)

2 and 3 would produce the same NWB file but could be selected based on memory limitations. 1 would produce a larger NWB file.

VBaratham commented 3 years ago

Yeah, I didn't realize how complicated the issue of first vs final resample makes this. I agree that those options all sound reasonable and I can't think of any other possibilities.

For 3, how about if we add "resample to final_resample" as part of the wavelet step? In the current pipeline, I think they are the only 2 steps that need to be combined. It should be simple to do iteratively or all at once.

We can use --first-resample-rate and --final-resample-rate if you prefer the longer argument names

I'll get my iterative write PR cleaned up and try to take a crack at this, let me know what you think

JesseLivezey commented 2 years ago

Closed by #60