LBHB / NEMS0

THIS VERSION OF NEMS IS NO LONGER SUPPORTED. PLEASE USE THE NEW NEMS REPOSITORY OR INSTALL NEMS_DB TO GET NEMS0 SUPPORT.
GNU General Public License v3.0
8 stars 4 forks source link

New signal / epoch functions wish list #57

Closed svdavid closed 6 years ago

svdavid commented 6 years ago

After yesterday's conversations (@christianbrodbeck , @crheller mostly), I've realized there are a few useful bits of functionality. Do any of these exist yet?

create_signal_from_3d_array(data, epoch_name="TRIAL", **other_init_args): since data are often stored in channel X time X trial format, this function could unwap it into 2d and create epochs automatically. Each epoch could be given the same name. Alternatively, maybe the user would provide a list of epochs with a name for each trial? Returns a new signal.

create_signal_from_dictionary(data, **other_init_args): same idea but expect a dictionary of 2d arrays

RasterizedSignal.normalize(norm_method="none"/"minmax"/"meanvar") normalize each channel in _data to have either min,max=0,1 or mean,var=0,1, depending on the parameter passed. This will be useful for a number of fitting algorithms which like input values in a "normal" range. Maybe norm_method should be part of the init function? Store the normalization values (d, g) in metadata so that...

RasterizedSignal.as_continuous(denormalize=True) returns the original (pre-normalized) data matrix

RasterizedSignal.excise_epoch(epoch_name) remove all epochs (and overlapping ones) matching epoch_name, splice out those segments from _data, remove the offending epochs and adjust the start/stop time stamps of remaining epochs so that they match the new data matrix. Maybe also the user can provide a list of different epoch names? @bburan : I know this is similar to the current segment scheme, so please let me know if this is a solved problem. Though I think currently epoch times are NOT updated?

RasterizedSignal.restore_excised_segments() splice back in nans for the segments that were removed

RasterizedSignal.plot(): very simple way of inspecting data without importing numpy, maybe allow users to provide some parameters specifying non-default ways to plot (average across epochs, generated a heatmap rather than lines, etc etc)

BaseSignal.add_epoch(name, start, stop) : Dumb, but this might be nice for users who aren't familiar with pandas in order to work with epochs. Maybe this already exists?

bburan commented 6 years ago

I can provide better commentary if you can provide more concrete use-cases. Some initial thoughts:

RasterizedSignal.normalize: Will that return a new instance of Signal (preferred), or operate on the signal in-place? In general, I do not think we should manipulate signals in-place because modulespec[0] may expect that the signal be normalized, but modulespec[5] may expect it to be unnormalized. If modulespec[0] and modulespec[5] are both working with the same signal where modifications are occuring in-place on the fly, then they may interfere with each other. If, however, we have two copies of the signal (one normalized and the other not), then the modules will not interfere with each other.

RasterizedSignal.as_continuous(denormalize=True): See above. Means the programmer has to check to see whether they're working with a "raw" or normalized signal before specifying.

Why not just have: RasterizedSignal.as_continuous(normalize='minmax'), RasterizedSignal.extract_epochs(normalize='minmax')? Let the normalization occur when fetching the data rather than making it a state of the signal that requires checking by all functions that use the signal. If you know that you always (and only) want to work with the normalized data, then just make RasterizedSignal.normalize return a new RasterizedSignal? When you need the unnormalized data, you work with the original signal. When you need the normalized data, you work with the normalized signal:

recording['stim'] = RasterizedSignal()
recording['stim_normalized'] = recording['stim'].normalize('minmax')

I've already responded to @crheller on Slack regarding part of the functionality you are proposing with RasterizedSignal.excise_epoch. The problem is that you get into some very challenging situations when adjusting the timestamps of epochs. I found it was much simpler to solve the problem by keeping the epochs intact and letting get_epoch_indices deal with this. You need to provide a use-case explaining why this is a problem before you can convince me it's better to do it when excising segments from the signal rather than when using get_epoch_indices.

Aside from that, RasterizedSignal.excise_epoch is basically a wrapper around RasterizedSignal.select_times? How is that better than the following approach? Yes, I get that it is slightly less verbose (by all of one line) but is that a good enough reason to add another method to RasterizedSignal? The more methods we add, the more confusing the Signal class becomes as new users need to explore each and every method to figure out which one is the best for their use-case (especially if methods have similar names).

times = signal.get_epoch_bounds(boundary_mode='exclude')
signal_subset = signal.select_times(times)

How is RasterizedSignal.plot better than adding a function, plot_signal that takes a signal as an argument (and provide parameters for various options)?

Will RasterizedSignal.restore_excised_segments return a new signal?

RasterizedSignal.add_epoch exists. I think you're proposing changing how the parameters are defined. Can you be more specific? Is start really starts (i.e., a 1D array, list or tuple) and end really ends?

bburan commented 6 years ago

I ignored the first two suggestions. They are fine with me. I'm going to file a new issue with some thoughts that are relevant to them.

bburan commented 6 years ago

By the way, these are great suggestions. Thanks @christianbrodbeck and @crheller. But, let's work on hammering out how these really will work before actually implementing them. Measure twice, cut once. Or, should I say discuss twice, code once?

bburan commented 6 years ago

Pasting from Slack for reference:

@bburan Question about the select_times method... Is there a way to get this to update the epochs as well? Right now, the new signal (or recording) returned by select_times still keeps the epochs from the old signal so you can't use any epoch methods on the new signal. I can try to implement this, but just wondering if it's already been taken care of somewhere and I'm missing it? bburan [7:41 AM] That's because you need to keep all epochs for the epoch manipulation methods to work properly, even on signals that are a subset of the original signal. For example, if you were to implement epoch elimination in select_times, how do you decide what epochs to eliminate? Do you eliminate epochs that are not fully contained in the selected timepoints? What about epochs only partially contained? Instead, I decided that it's best to handle this via Signal.get_epoch_indices after you have called subset_times. (edited) That's where it checks to see if each epoch is valid. Only the subset of epochs that fall within the times you selected (in combination with the settings for boundary_mode and fix_overlap). The expectation that I had when I helped implement the epoch system was that all epoch manipulation methods should pass epochs through this method before doing anything else. (edited) Can you please explain specifically what's not working? It's probably an edge-case that I didn't anticipate or a bug. (edited)

svdavid commented 6 years ago

@bburan For the normalization thing. I think it's important to keep the actual data matrix normalized because it will get called over and over again during fitting. The only time you un-normalize would before plotting figures. There are some complexities if we want to apply the same model to a recording not involved in fitting (apply norm terms from recording 1 to recording 2), but that's a low priority since we don't do that at the moment.

And, yes, it should go without saying that anything that modifies a signal should return a new copy. Now I think the norm thing should actually be part of init, though.

For plotting, I don't have a really strong opinion about how it's implemented (method or function). The appeal of using a method is that it doesn't require the user to import anything new. But the more important point is that it's easy to use so that it's easy to look at the data.

It sounds like the other stuff is more or less implemented already, which sounds great.

bburan commented 6 years ago

@svdavid: That's fine. But, in that case, my suggestion should address both use-cases. Have Signal.normalize return a modified copy of the signal that's saved to the Recording as stim_norm (assuming the original signal was called stim). Therefore, the normalization is computed only once as part of the preprocessing (you can even add an xforms function for this).

The modulespecs will be informed that they need to use stim_norm when fitting (the initializer can be smart enough to specify that stim_norm should be used as the input based on the arguments provided to it). The plotters will be informed to use the raw signal.

recording['stim'] = Signal()
recording['stim_norm'] = recording['stim'].normalize(method)
modulespec = initialize_from_keywords(spec_string, first_input='stim_norm')
do_fit(...) # modulespec will use `stim_norm`.
plot_signal(recording['stim'])

On Wed, Apr 11, 2018 at 8:52 AM, Stephen D notifications@github.com wrote:

@bburan https://github.com/bburan For the normalization thing. I think it's important to keep the actual data matrix normalized because it will get called over and over again during fitting. The only time you un-normalize would before plotting figures. There are some complexities if we want to apply the same model to a recording not involved in fitting (apply norm terms from recording 1 to recording 2), but that's a low priority since we don't do that at the moment.

And, yes, it should go without saying that anything that modifies a signal should return a new copy. Now I think the norm thing should actually be part of init, though.

For plotting, I don't have a really strong opinion about how it's implemented (method or function). The appeal of using a method is that it doesn't require the user to import anything new. But the more important point is that it's easy to use so that it's easy to look at the data.

It sounds like the other stuff is more or less implemented already, which sounds great.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/LBHB/NEMS/issues/57#issuecomment-380502627, or mute the thread https://github.com/notifications/unsubscribe-auth/AAT5QsqGQI3YXKyLlqHkXMQzOFWMIZ4Nks5tniawgaJpZM4TQGLg .

jacobpennington commented 6 years ago

create_signal_from_dictionary(data, **other_init_args): same idea but expect a dictionary of 2d arrays

Is there any reason TiledSignal wouldn't be able to handle this (the one we made for baphy's stim data)? It takes in a dictionary of 2d arrays as its data argument.

svdavid commented 6 years ago

@jacobpennington Sure, that sounds great.

@bburan Can you help me understand what's wrong with the original proposal in your eyes? You solution is workable but would increase memory consumption and add new degree of complexity to the model spec thing. As far as I'm concerned, you should never NOT normalize during fitting because many fitters fail if you haven't normalized. If it would make things less confusing, I could spec it out as .as_continuous(norm=True) as the default and .as_continuous(norm=False) as an option to return the original. Then the only thing we have to change in our own code is the plotting functions.

The complexity with using stim_norm in the modelspec is first that we'd have to have two options in various key words and the plot function would have to be smart enough to chop off the _norm when it plots. OR it would have to be hard-coded to use "stim" which seems un-pythonic.

bburan commented 6 years ago

@svdavid Are you proposing that a signal cache both it's normalized (upon first call) and raw data, then as_continuous(normed='minmax') will return the normalized signal and as_continuous(normed=None) the raw signal? Then will all methods that extract data (e.g., extract_epoch) be updated to take normed as a keyword?

svdavid commented 6 years ago

@bburan Nope, I'm saying store only the normalized signal (since that's how we'll want it to be 99.9% of the time and then optionally re-scale it to it's pre-normalized state if a plotting function asks for it).

sig = nems.Signal(data=data_matrix, normalize="minmax", **otherargs)

then the fitter calls sig.as_continuous() to retrieve _data in its normalized state (without any extra computation required) and the plotter calls sig.as_continuous(normalized=False) to retrieve _data in its pre-normalized state.

bburan commented 6 years ago

Ok. In that case, let me suggest the following behavior:

# In a loading or preprocessing step, load the signal as-is.
raw_signal = Signal()

# Now, normalize it, which returns a normalized copy of the signal with information on how to reverse the normalization. Save this normalized signal object to the recording. This is what you will then use from that point on. 
recording['stim'] = raw_signal.normalized('minmax')

# By default returns the raw array regardless of whether it's been normalized or not.
recording['stim'].as_continuous() 

# Returns the normalized array. It's best for the modules to be explicit about what type of signal they want. If the signal is an instance of RawRasterizedSignal, then it will either compute the normalization on the fly (and log a warning so you know that you have an inefficient model) or raise an error. Up to you how you do it.
recording['stim'].as_continuous(normalized=True)

The biggest problem with this approach is that the module doesn't know how the signal was normalized when requests as_continuous(normalized=True).

In general, I strongly recommend that as_continuous() always return the raw signal. This ensures that there are no surprises. By explicitly requesting the normalized data, it's easier to catch coding errors.

bburan commented 6 years ago

Also, does this 99.9% of use-cases include the resp and state data? If not, it's really closer to 33% of use cases (e.g., stim, resp, state). These are all going to be instances of RasterizedSignal. RasterizedSignal.as_continuous should not behave differently (using the default arguments) based on whether the signal was pre-normalized or not.

svdavid commented 6 years ago

Ok, that seems reasonable. And touche, it is only 33.3 % of the cases (or maybe 66%, since we will want to normalize pupil). But my thoughts going into this are that most model fits just start by normalizing the stimulus and then never think about it again. So I figured, why not just provide a normalized stimulus by default and then let the special cases work out when they don't want the signal normalized. Making modelspecs smart about this can be a bit of a headache since only the first modules takes stim as its input. But which module is the first varies between models.

bburan commented 6 years ago

I see what you're saying. What are the main entry-point modules? WC is one. What about others?

svdavid commented 6 years ago

Pretty much any module can be an entry point... dlog, stp, fir, stategain.

Stephen

On Wed, Apr 11, 2018, 2:57 PM Brad Buran notifications@github.com wrote:

I see what you're saying. What are the main entry-point modules? WC is one. What about others?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/LBHB/NEMS/issues/57#issuecomment-380608831, or mute the thread https://github.com/notifications/unsubscribe-auth/AH_8p6VePdFmx6wfVRJulD0K9I7qAMLJks5tnnxRgaJpZM4TQGLg .

bburan commented 6 years ago

Ok. One follow up question. What's the advantage to plotting the unnormalized data? Aren't the plots following a fit mainly diagnostic?

On Wed, Apr 11, 2018, 7:19 PM Stephen D notifications@github.com wrote:

Pretty much any module can be an entry point... dlog, stp, fir, stategain.

Stephen

On Wed, Apr 11, 2018, 2:57 PM Brad Buran notifications@github.com wrote:

I see what you're saying. What are the main entry-point modules? WC is one. What about others?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/LBHB/NEMS/issues/57#issuecomment-380608831, or mute the thread < https://github.com/notifications/unsubscribe-auth/AH_8p6VePdFmx6wfVRJulD0K9I7qAMLJks5tnnxRgaJpZM4TQGLg

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/LBHB/NEMS/issues/57#issuecomment-380653513, or mute the thread https://github.com/notifications/unsubscribe-auth/AAT5QjQ9OzRHjdTKCy6kPsISPaU3Xj5Gks5tnrmygaJpZM4TQGLg .

bburan commented 6 years ago

@svdavid and I talked about this offline. The plan is for as_continuous, extract_epochs, etc. to return the data as-is. If it was normalized, it will be returned as normalized. If the function requires un-normalized data or normalized data, it will explicitly specify it using a normalized parameter. Valid arguments for the parameter will be:

None: Return the signal as-is (normalized if it was normalized, unnormalized if it wasn't). False: Return the signal un-normalized. If it was normalized, denormalize it first. 'minmax': Return the signal normalized using this approach. '<other types of normalization>': Return the signal normalized using that approach.

svdavid commented 6 years ago

everything is basically working, except the quick & dirty plot functions. Maybe those can wait for another day.