Increasing location - Githubissues

jelmerborst commented 8 months ago

Enjoy. Let me know if I can help!

jelmerborst commented 8 months ago

It prevents the estimation from stopping when the difference in subsequent iteration is still more than .1 sample. This works together with ‘get_locations’, which can increase the locations if sequential events move less than .1 sample and their correlation is high. It might not be required in the end (you could get their by experiment with the tolerance), but this ensures it works well together.

On 8 Mar 2024, at 18:14, GWeindel @.***> wrote:

@GWeindel commented on this pull request.

In hmp/models.py https://github.com/GWeindel/hmp/pull/129#discussion_r1518012550:

if self.location_corr_threshold is None: #standard threshold

if i >= min_iteration and (np.isneginf(lkh) or tolerance > (lkh-lkh_prev)/np.abs(lkh_prev)):

break

else: #threshold adapted for location correlation threshold:

EM only stops if location was not change on last iteration

and events moved less than .1 sample. This ensures EM continues

when new locations are set or correlation is still too high.

(see also get_locations)

stage_durations = np.array([self.scale_to_mean(x[0],x[1]) for x in parameters])

stage_durations_prev = np.array([self.scale_to_mean(x[0],x[1]) for x in parameters_prev])

if i >= min_iteration and (np.isneginf(lkh) or (locations == locations_prev).all() and tolerance > (lkh-lkh_prev)/np.abs(lkh_prev) and (np.abs(stage_durations-stage_durations_prev) < .1).all()): Why is (np.abs(stage_durations-stage_durations_prev) < .1).all()) needed?

— Reply to this email directly, view it on GitHub https://github.com/GWeindel/hmp/pull/129#pullrequestreview-1925382849, or unsubscribe https://github.com/notifications/unsubscribe-auth/AETDTH2KJ6U2JRD5VS7GYGDYXHWYJAVCNFSM6AAAAABDWYPKN2VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTSMRVGM4DEOBUHE. You are receiving this because you authored the thread.

GWeindel commented 8 months ago

Overall the PR is awesome, the implementation of the increasing location looks very good and useful. The new fit function also, the speed-up is quite important compared to main, less duplicates are found and the progress bar is very satisfying now!

I made some modifications to your PR. I here write the two main ones:

Deltas

The new way of computing deltas in the fit function generates a lot of duplicated events (preventing to find others in certain scenarios,e.g. plot below). Replacing it with the current version in main:

lkhs = self.sliding_event(fix_pars=True, fix_mags=True, method='max', verbose=False)[0]

gives less event and a noticeable speed-up. On this issue I would suggest we stick with the former version, also because I understand that way of doing so (bias in the likelihood due to the partition of the RT), not sure what to think of the comparison to a neutral fit. But maybe we'll come up with evidence for that way of doing things.

New way: Pasted image 20240308180409

vs Main way: Pasted image 20240308180120 (see also fit function in tutorial 2 where an event is found at sample 2 despite no simulated event there)

Locations

I noticed, at least for the fit function, the new locations have a value other than 0 in the last stage. I don't know exactly where the problem lies but I now force this last location to be 0 directly in the EM before each call to estim_probs.

I do this change as the last event should not have a location. While not major it does change some things in the fit function (e.g. one event less is found in accuracy data of tutorial 4).

GWeindel commented 8 months ago

If you agree with those changes we can merge :partying_face:

jelmerborst commented 8 months ago

sounds good! That location issue is odd – I'll check it out when I have time. The likelihood in fit is still an open issue to me - the bias in the likelihood due to the partition is different every time you add an event, if I remember correctly. But we can explore both issues later, maybe you can post issues for those. Thanks!

GWeindel / hmp

Increasing location #129

EM only stops if location was not change on last iteration

and events moved less than .1 sample. This ensures EM continues

when new locations are set or correlation is still too high.

(see also get_locations)

Deltas

Locations