Accounting for location/padding in the EM fails for now

GWeindel commented 1 year ago

Context

The cross-correlation used for the pattern analysis yields some padding in the signal

from scipy.signal import fftconvolve 
plt.plot(np.arange(15,45), init.template)
plt.plot(np.arange(59),fftconvolve(init.template,init.template,mode='full'))

Pasted image 20230424204419

This forces us to add a penalty to the samples that immediately follow a detected event. Otherwise all n_events gets squashed into one (or a subset gets squashed), to reproduce use a location of 0 when fitting a model

To do this two solutions:

Solution 1: Add a location parameter

        #in def __init__
        if location is None: 
            self.location = int(self.event_width / self.steps//2)+1
...
        # in def estim_probs()
        locations = np.concatenate([[0], np.repeat(self.location, n_events)])#all stages except first stage have a location (mainly to avoid overlap in cross-correlated signal) 
        locations[-1] = 0
        for stage in range(n_stages):
            pmf[:,stage] = self.distribution_pmf(parameters[stage,0], parameters[stage,1], locations[stage])
...
      # in distribution_pmf()
        p = self.cdf(np.arange(1, self.max_d+1), shape, scale=scale, loc=location)
...
     # in scale_parameters()
        params[:,1] = np.diff(averagepos, prepend=0)
        params[1:-1,1] -= self.location

Problems:

The true generating distributions don't have this location parameter for every stage except last and first ones. So by adding a location parameter we are not estimating the generating distribution
The other problem is that removing the location from the average position differences params[1:-1,1] -= self.location breaks working code and gives weird solutions on the existing datasets

Solution 2 : Censor first samples in the PDF of the time distribution

Instead of a location parameter, one can censor the samples that follow the event if it is a stage between two events. This has the advantage of modelling a time distribution without location and not require any additional step when computing the scale parameters from the average positions.

        # in def estim_probs()
        for stage in range(n_stages):
            pmf[:,stage] = self.distribution_pmf(parameters[stage,0], parameters[stage,1])
            if n_stages-1 > stage > 0:#all stages except first stage are censored on half a pattern (because of padding from cross-correlated signal)
                pmf[:int(self.location),stage]=0

This is currently under investigation

GWeindel commented 1 year ago

The implementation of Solution 2 requires more rewriting than shown here, to try out the solution one can switch to the padding_implementation branch

jelmerborst commented 1 year ago

Solution 2 sounds much less error-prone in the long run (and easier to interpret), so if that works it seems to preferred.

In several places you mention first and last stage, but only the first stage is exempted from this, right? For example locations[-1] = 0 needs to be removed, I think.

GWeindel commented 1 year ago

I agree that solution 2 is preferred.

About the last stage, previous implementations did include a location/padding for that stage, but I think it is an error: The last stage ends with the RT, so there is no pattern expected. Hence, I think this last stage should be left without location as it might generate weirdness to account for this location while having the end of the time serie.

jelmerborst commented 1 year ago

I might have fixed solution 1 with https://github.com/GWeindel/hsmm_mvpy/commit/1cc45bb99191e0c59852c90b2f48e8ef82b67d88.

I changed multiple things:

added a minimum duration of 1 after removing locations
generate parameters up to max_d instead of mean_d, removing a bias to earlier bumps
double the location parameter. I think it should be event_width, not event_width/2

However, many solution with random starting points do not converge.

GWeindel commented 1 year ago

For the location parameter, the padding in the cross-correlation justifies event_width/2 And it works (same simulation parameters as tutorial 2 but different seed):

init = hmp.models.hmp(hmp_dat, sfreq=epoch_data.sfreq, event_width=50, cpus=cpus, location=25)#Initialization of the model
estimates = init.fit()
hmp.visu.plot_topo_timecourse(epoch_data, estimates, info, init, 
                                times_to_display = np.mean(np.cumsum(random_source_times,axis=1),axis=0), magnify=1)

vs.

init = hmp.models.hmp(hmp_dat, sfreq=epoch_data.sfreq, event_width=50, cpus=cpus, location=25)#Initialization of the model
estimates = init.fit()
hmp.visu.plot_topo_timecourse(epoch_data, estimates, info, init, 
                                times_to_display = np.mean(np.cumsum(random_source_times,axis=1),axis=0), magnify=1)

GWeindel commented 1 year ago

The minimum duration might indeed be useful, I'll check that

GWeindel commented 1 year ago

Fixed by #102

GWeindel / hmp