Interpolating data - Githubissues

jtmbeta commented 8 years ago

Hi there!

First, I'd like to thank you again for a great package. It has been incredibly helpful in terms of getting work done, but also a great introduction to scripting in Python.

Now my issue... When I reconstruct pupil data with the interp_eyelink_blinks() function, and the interp_zeros() function, I'd like to know how many samples were actually altered. Is there an easy way of getting a read-out of this information?

Also, I'd like to be able to explain how the data interpolation works. 'interp_zeros()' is straight forward, but 'interp_eyelink_blinks()' is a bit more complicated. I've noticed that you adjust the start and endpoints of the EBLINK events, but I haven't figured out just how. In one of your comments, you also say 'As per the eyelink documentation...'. Could you tell me which documents you refer to?

Finally, I have not noticed any citation instructions. Please let me know if you have a preferred method of acknowledgement when it comes to publication.

Many thanks Joel

beOn commented 8 years ago

Hey, good to hear from you! I'm in the lab at the moment, and will take a look at this on sunday. One thing I can address now, though: the eyeblink documentation I'm referring to is the "EyeLink 1000 User Manual version (9/13/2007)," page 108:

Blinks are always embedded in saccades, caused by artificial motion as the eyelids progressively occlude the pupil of the eye. Such artifacts are best eliminated by labeling and SSACC...ESACC pair with one or more SBLINK events between them as a blink, not a saccade. The data contained in the ESACC event will be inaccurate in this case, but the , , and data will be accurate.

It is also useful to eliminate any short (less than 120 millisecond duration) fixations that precede or follow a blink. These may be artificial or be corrupted by the blink.

I mark everything within a blink-surrounding "saccade" as part of the blink.

It would be easy enough to alter the interp_ methods to return the number of changed indices. If you need it quickly, you could keep the old data around, then count how many non-identical entries there are in the interpolated data.

Ben

beOn commented 8 years ago

For getting the number of samples interpolated, take a look at line 118 of cleanup.py. In fact, here's the whole function, minus it's documentation:

def mask_eyelink_blinks(samples, events, mask_fields=["pup_l"], find_recovery=True):
    samps = samples.copy(deep=True)
    indices = get_eyelink_mask_idxs(samps, events, find_recovery=find_recovery)
    samps.loc[indices, mask_fields] = float('nan')
    return samps

The length of the variable indices will tell you the number of samples being interpolated. Perhaps toss a line in there that prints out it's length, or alter the function to return (samps, len(indices), change the call in interp_eyelink_blinks to get both return values, then return them, like so:

(samps, count) = mask_eyelink_blinks(samples, events, mask_fields=interp_fields, find_recovery=find_recovery)
samps = samps.interpolate(method="linear", axis=0, inplace=False)
return samps, count

Alternatively, you could just call get_eyelink_mask_idxs on the dataset and count the length of the return. Similarly, to get the number of zeros being interpolated, you can just let the length of `samps[samps[f] == 0] before calling interp_zeros.

As for the blink interpolation, it changes depending on whether you use find_recovery. If you don't, we interpolate all samples within BLINK events, and within SACC events that contain a BLINK event (this is what I described in my previous message).

If you DO use find_recovery, we additionally apply adjust_eyelink_recov_idxs to try and find the 'true' end of the blink. I put this in because often there would be a few samples that were still noticeably part of the blink artifact left over after interpolation. It would be worth interpolating a dataset with and without this option then comparing the results, so you can get a sense of what this function is doing. In many cases, the adjustment is very small or makes no difference at all, and you can alter how big a change can possibly be made by altering zthresh, window, and kernel_size. The method is a little weird (we look at the average of the z-scored rate of change within a window, and find the first index where the rate of change is <10% of the average, but will only look so far). This is a weird method, perhaps, and I'm open to suggestions, but for a simple implementation this was the most effective approach I found. For more details, check out the body of adjust_eyelink_recov_idxs and its comments.

Hope that addresses your questions - I'll get back to you about citations soon.

beOn commented 8 years ago

For the time being, you can cite this as:

Acland BT, Braver TS (2014). Cili (v0.5.3) [Software] Available from https://github.com/beOn/cili

Feel free to update the url and/or version to point to a different commit.

We'll have a doi for it with the next release.

jtmbeta commented 8 years ago

Yes, that addresses the questions. Thank you very much!

beOn commented 8 years ago

I've updated the citation instructions with a doi.

beOn / cili

Interpolating data #10