Eliminate calls to plyr and reshape2 functions

dmirman-zz commented 5 years ago

Current version of this package seems to use both older (plyr, reshape2) and the newer tidyverse (dplyr, tidyr) packages. This causes occasional conflicts and is generally inelegant. We should streamline to only use the newer ones.

mwinn83 commented 5 years ago

I'm rather good with dplyr stuff. If you point me to the plyr calls I can update them if you'd like

On Thu, Jul 5, 2018 at 12:30 PM, Dan Mirman notifications@github.com wrote:

Current version of this package seems to use both older (plyr, reshape2) and the newer tidyverse (dplyr, tidyr) packages. This causes occasional conflicts and is generally inelegant. We should streamline to only use the newer ones.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dmirman/gazer/issues/4, or mute the thread https://github.com/notifications/unsubscribe-auth/ALPja-GntXbgw91UTQcvGuKn0Ims5kaJks5uDmlwgaJpZM4VEbDJ .

dmirman-zz commented 5 years ago

That would be awesome. Here are the instances of plyr/reshape2 references that I was able to find:

code_poly uses melt
compare_groups uses melt
get_ranef uses colsplit
merge_pupil_files uses rename -- not sure whether this is the plyr or dplyr rename
missing_pupil_count uses ddply
preprocess_pupil uses ddply

dmirman-zz commented 5 years ago

Also, maybe you can help with the binify_fixations performance issue #5

mwinn83 commented 5 years ago

I'm actually comfortable with retaining the reshape2 melt functions and not converting to gather. I personally think gather is less transparent.

I cleaned up the missing pupil count, although I think it would be beneficial to nto drop the missing data, but instead to use a variable to mark it to be dropped. but I didn't want to make any radical changes before checking in with you.

For the rename functions - I was not sure which direction these went in. For dplyr rename, it's usually (new_name = old_name), but the way it's written here makes me wonder if you write it using a function where it's the opposite direction. Wanted to check with you first.

Preprocess pupil looks like it could use a major overhaul. The way I treat pupil data is dramatically more complicated than this, and the use of variable names (pup, pupil, baselinecorrectedp) is not consistent to my liking. When I get to the lab today I can sent you a script that I use for pupil data processing for you to see if you want to include it into the package more on that in a moment. In that script on github, I cleaned up a few things, but want to be delicate with replacing the aggregate() function, since I never use it, and want to make sure I replicate what your intention was. I’m 99% sure I know, but I want to copy it here before overriding it on github. Excuse any weird indentation spacing in the email window.

baseline <- pupil_interp %>%

            dplyr::filter(timebins > baseline_window[1],

            timebins < baseline_window[2])) %>%

            group_by(subject, trial) %>%

           summarise(baseline = median(interp, na.rm = TRUE)) %>%

           dplyr::rename(Subject = subject)

Okay, so about this script… I think it lacks some important features. My pupil data processing includes:

1 - Identifying which eye has less missing data, per trial, and creating a new column with a constant name for data from whichever eye is tracked

2 - Expand NA gaps (because blinks are accompanied by pupil size distortions before & after)

3 - Omit spotty samples that are not part of longer runs of data (e.g. a single sample way off track, in the middle of a string of NA values)

4 - Interpolate using a maximum rule

5 - Interpolate using no limit

6 - Low-pass filtering the no-limit interpolated values

7 - Copy the LPF data into a new variable, removing samples that were removed by the max-gap rule

8 - Calculate missing data

9 - Calculate baselines per trial

10 - Calculate baseline neighbors

11 - Calculate baseline 5-sample rolling average

12 - Baseline substraction

13 - Calculate baseline proportional change

14 - Convert lots of columns from numeric to integer to save space

15 - Save that file along with missing data report (I realize this is not going to be part of our flow in the package)

16 - Decimate the data frame to include only samples that are multiples of 40 (will work for samplerates of 250, 500 or 1000 from the eyelink)

17 - Save the decimated file

18 - Next, I identify contaminated missing data to be removed in a follow-up step using an interactive browser in shiny. I’m planning up a way to do it automatically. At this time, some things cannot be done automatically, though. But I’m talking about trials with excessive variance during the baseline period, trials where the baseline-corrected value drops by a significant amount relative to the average amount for the rest of the trials (i.e. one of these situations where the baseline was abnormally high, and the corrected track drops way below the floor, screwing up the averaging). I also go in and brush away momentary distortions in the tracks, which can also screw up the averaging.

19 - After those steps are complete, I re-run the entire filtering script, calculating all the values after having them cleaned up by all the steps.

I realize that incorporating all of this would 1) be overkill for this package, and 2) add a lot of work at a time when you’re trying to rap it up before a conference. So let me know what you think would be best.

Matt

On Fri, Jul 6, 2018 at 7:28 AM, Dan Mirman notifications@github.com wrote:

Also, maybe you can help with the binify_fixations performance issue #5 https://github.com/dmirman/gazer/issues/5

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dmirman/gazer/issues/4#issuecomment-403050589, or mute the thread https://github.com/notifications/unsubscribe-auth/ALPja6ZJ1B4EBCxceSsD5uILlcHOqrRzks5uD3P4gaJpZM4VEbDJ .

jgeller112 commented 5 years ago

Hi Matt,

Re the edited baseline code, that should work!

I am okay with you editing the pupil functions, and adding functionality that you deem necessary. I intended to reach out to you and ask you about important features. I am aware that there are many options to consider when dealing with this type of data, which is why I wanted to have this package. I think we should include only the steps that are essential.

jgeller112 commented 5 years ago

I might be in the minority, but I prefer gather over melt.

dmirman-zz commented 5 years ago

You two (Matt and Jason) are the experts on pupil data, so I leave it to you to decide how to handle these issues. I was hoping that this package would make our lives easier, so from that perspective, I think Matt would want it to implement his preprocessing pipeline (so his lab could use the package).

My sense from talking with Jason is that different people do different things, which makes it difficult to replicate analyses across labs. This package could help with that. Maybe we want to have separate functions for (at least some of) these preprocessing steps and a main function that defaults to this specific pipeline, but could have options to change some steps. Having those separate functions would also allow people to build their own preprocessing pipeline. I'm not inclined to guess at all the different options that someone might want, we can just implement what we think it good but try to do it in a somewhat modular way to allow customization.

Regarding timing: the conference is in November, so we have time to do more work on this package. If you send that script, Jason and I can work on turning it into package functions.

mwinn83 commented 5 years ago

Sounds good. I have a lot of my functions separately, although they are not documented with standard R-package-style comments. But I think that might not be a tremendous amount of work. We could even do a set of "include_step_X" as input arguments to the main function, and invoke them conditionally in the corresponding pipeline. Other input arguments might require dots - Tristan might be the best to consult on that.

Matt

On Fri, Jul 6, 2018 at 10:34 AM, Dan Mirman notifications@github.com wrote:

You two (Matt and Jason) are the experts on pupil data, so I leave it to you to decide how to handle these issues. I was hoping that this package would make our lives easier, so from that perspective, I think Matt would want it to implement his preprocessing pipeline (so his lab could use the package).

My sense from talking with Jason is that different people do different things, which makes it difficult to replicate analyses across labs. This package could help with that. Maybe we want to have separate functions for (at least some of) these preprocessing steps and a main function that defaults to this specific pipeline, but could have options to change some steps. Having those separate functions would also allow people to build their own preprocessing pipeline. I'm not inclined to guess at all the different options that someone might want, we can just implement what we think it good but try to do it in a somewhat modular way to allow customization.

Regarding timing: the conference is in November, so we have time to do more work on this package. If you send that script, Jason and I can work on turning it into package functions.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dmirman/gazer/issues/4#issuecomment-403099945, or mute the thread https://github.com/notifications/unsubscribe-auth/ALPja7qtmdfd6Py76D1P2okyB7rEP21Yks5uD5-egaJpZM4VEbDJ .

dmirman-zz / gazer

Eliminate calls to plyr and reshape2 functions #4