feature request: indexing flags

jeremymanning commented 7 years ago

To facilitate doing analyses separated by experiment and list number, I propose adding two flags:

subjgroup provides a per-subject group label (e.g. labeling which experiment each subject participated in). analysis functions carry out analyses separately for each group, and plotting functions show a different curve for each group (e.g. in a different color). for example, this will allow us to easily do analyses for different experiments, in a single command.
listgroup provides a per-list label (independent of subjects). analysis functions carry out analyses separately for each listgroup (in addition to each subjgroup), and plotting functions show a different curve for each listgroup (e.g. in a different color). for example, this will allow us to easily do analyses for the first 8 vs last 8 lists, see how memory fingerprints change over the course of the experiment, and other similar sorts of things, also in a single command.

The way I'm imagining this would be implemented is to first divide the data based on subjgroup, passing in the listgroup flags to the function that carries out analyses for each subject. Then inside the subject-level analysis function, analyses would be performed separately on each listgroup.

An efficient way to set this up would be to have "general" analysis function that takes in a pyro object and an (analysis) function handle. The general analysis function would then have an outer loop over subjgroup and an inner loop over the listgroups for the current subjgroup, and would call the analysis function handle for actually doing the analysis on that subjgroup-listgroup piece of the data and then aggregrate the results in some way (e.g. by returning a new dataframe or pyro object).

Plotting could work similarly-- we could have a general plotting function that takes in a results object and a (plot) function handle and loops over subjgroup and listgroup, adding each piece of the data in the innermost loop (listgroup of the subjgroup) to the current plot (or aggregating the results in a way that could easily be plotted at the end).

andrewheusser commented 7 years ago

Sounds good to me, but not totally clear on what form the args would take, and where they would be implemented. Are they required args to create the pyro object? or are they passed into an analysis/plotting function?

The form of the args could be a list of string/int labels the length of the number of subjects in the case of the subjgroup and the length of the number of lists for listgroup

@KirstensGitHub since you are leading the dev of this package, how does this sound to you?

jeremymanning commented 7 years ago

I was thinking they would not be required to create the pyro object. I was thinking that the analysis and plotting functions would support these flags. So you'd do something like:

data = <make pyro object> explabels = <a list of length n_subjects saying which experiment each subject participated in> listlabels = [1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2] (groups the first 8 and last 8 lists) x = spc(data, subjgroup=explabels, listgroup=listlabels) creates 2 serial position curves for each experiment; 1 for the first 8 lists and 1 for the second 8 lists plot_spc(x, listgroup=listlabels) combines across all experiments and plots 2 serial position curves (for the first vs. last 8 lists). If we had included subjgroup, more curves would have been plotted

To implement spc, I'm thinking we could have that function wrap another function (e.g. analyze) that takes subjgroup, listgroup, and spc_helper as arguments-- where spc_helper computes a serial position curve for a pyro object, and analyze breaks down the pyro object by subjgroup and listgroup and calls spc_helper for each chunk of the data.

(And similarly for plot_spc, which would internally call a general plot function that takes as arguments subjgroup, listgroup, and spc_plot_helper, where spc_plot_helper adds a serial position curve to the current axes, and plot calls spc_plot_helper for each chunk of data (divided by subjgroup and listgroup).

Then another function, say crp would also call analyze, but instead of spc_helper, crp would wrap analyze and pass in crp_helper, which computes a CRP curve for a chunk of data.

Does this make sense?

andrewheusser commented 7 years ago

yup makes sense and sounds like a great structure.

@KirstensGitHub on board with this?

KirstensGitHub commented 7 years ago

hey, sorry, was just reading through everything slowly. all sounds good! is the reasoning for passing the metadata to the analysis/plotting functions rather than including in the object simply because these are the functions that actually need/use it?

jeremymanning commented 7 years ago

@KirstensGitHub -- yeah, exactly. the indices should be passed only at the stage they are actually used. this is a more flexible design than including them in the pyro object directly. for example, suppose we wanted to create three sets of plots:

1.) plot a curve for each experiment (combining across lists) 2.) plot a curve for each list (combining across experiments) 3.) plot a curve for each list/experiment (combining across subjects within each experiment)

with the design i've proposed above, you can use the same pyro object and simply change the subjgroup and listgroup flags to do each analysis. if the flags had been incorporated into the pyro object directly, we'd need to maintain 3 copies of the pyro object.

KirstensGitHub commented 7 years ago

ah, that makes a lot of sense :) I was thinking over the weekend that it might potentially make sense to include the recall_matrix in the object (though this wasn't part of @andrewheusser 's original outline for the object) since I think it gets used for basically all of the subsequent analyses.. is that reasonable or should the objet be purely raw subject data?

jeremymanning commented 7 years ago

I think the cleanest implementation would be to have the pyro object contain only the subject data (processed into a convenient format). we should have a helper function to compute a recalls matrix from a pyro object, but we should follow the same philosophy as the indexing-- to the extent possible, we should only include an argument or do an analyses at the stage it's needed.

to expand on the logic, we may have analyses or plots that don't require the recalls matrix, so we don't need to have the recalls matrix be part of the data representation.

on the other hand, we do want a generally accessible function for creating the recall matrices, since it'll be called by many of the analyses.

note: a potential exception to the rule "wait until the last possible moment to pass in an argument or do a computation" is for computations that take a long time to carry out. for those, it makes sense to do whatever possible to ensure they are carried out as few times as possible. but for the sorts of analyses we're discussing, everything is going to be super quick and we don't need to worry much about compute times.

andrewheusser commented 7 years ago

def spc(pyro, listlabels=None, subjlabels=None):
    '''run serial position analysis'''

def analyze(data=pyro, listlabels=listlabels, subjlabels=subjlabels, analysis=analysis_function):
    '''performs averaging according to list labels and subj labels'''

def spc_helper(pres_for_a_single_list, rec_for_a_single_list):
    '''computes spc for a single list'''

@jeremymanning just to confirm, is this what you had in mind?

jeremymanning commented 7 years ago

Yes... But can we call them subjgroup and listgroup?

andrewheusser commented 7 years ago

oops! typo - yep no prob

KirstensGitHub commented 7 years ago

thanks @andrewheusser for clarifying, looks good :)

andrewheusser commented 7 years ago

@jeremymanning @KirstensGitHub can I get your feedback on this? This is currently how you use the listgroup/subjgroup args. The data here is a group of 6 subjects. Default behavior is to plot the average over the group:

spc = pyr.spc(pyro, listgroup=['average']*16)
pyr.plot(spc)

Or if its split by list, it will split during plotting:

spc = pyr.spc(pyro, listgroup=['early']*8+['late']*8)
pyr.plot(spc)

If plot_type is set to subject, it will plot a line for each subject:

pyr.plot(spc, plot_type='subject')

and you can group by subject using the subjgroup arg:

pyr.plot(spc, plot_type='subject', subjgroup=['exp1','exp1','exp2','exp2','exp3','exp3'])

Finally, to split on both variables, you can pass plot_type='grid', which will separately by list and subject group:

pyr.plot(spc, plot_type='grid', subjgroup=['exp1','exp1','exp2','exp2','exp3','exp3'])

Let me know if you think this works, or if you'd like to see any changes. Sidenote - There is currently 1 plot function that is aware of the analysis type and makes changes accordingly. We could split this into multiple plotting functions, but didn't seem necessary at this point.

KirstensGitHub commented 7 years ago

Hmm, looking at these, I guess it would make more sense to plot together by subject in the last case and plot separately by subject when we're looking at the average over all lists for each subject....

andrewheusser commented 7 years ago

@KirstensGitHub not sure what you mean exactly, can you elaborate?

jeremymanning commented 7 years ago

@KirstensGitHub are you referring to plot_type='grid' vs. plot_type='subject'? my understanding is that the plot type is a user option that you can set using the plot_type flag...?

@andrewheusser as a very minor change, I'd change the words "Subject" and "List" plot to be "Subject group" and "List group" when the subjgroup and listgroup flags are used, respectively. I'd also change the "|" (pipe) to a ", " (comma + space) for the titles in the "grid" plot at the end.

KirstensGitHub commented 7 years ago

ah, so pyr.plot(spc, plot_type='subject') would give each individual subject in a separate plot using facet grid?

jeremymanning commented 7 years ago

@KirstensGitHub no...i think pyr.plot(spc, plot_type='subject') would give each subject (or subject group) a curve on a single plot, whereas pyr.plot(spc, plot_type='grid')would have each subject (or subject group) in it's own plot, with all the plots arranged in a grid. (is this right @andrewheusser?)

andrewheusser commented 7 years ago

yep, thats correct

KirstensGitHub commented 7 years ago

oops! I meant to say grid instead of subject. anyways, got it, thanks

jeremymanning commented 7 years ago

👌

ContextLab / quail

feature request: indexing flags #6