Closed jeremymanning closed 7 years ago
Sounds good to me, but not totally clear on what form the args would take, and where they would be implemented. Are they required args to create the pyro object? or are they passed into an analysis/plotting function?
The form of the args could be a list of string/int labels the length of the number of subjects in the case of the subjgroup
and the length of the number of lists for listgroup
@KirstensGitHub since you are leading the dev of this package, how does this sound to you?
I was thinking they would not be required to create the pyro object. I was thinking that the analysis and plotting functions would support these flags. So you'd do something like:
data = <make pyro object>
explabels = <a list of length n_subjects saying which experiment each subject participated in>
listlabels = [1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2]
(groups the first 8 and last 8 lists)
x = spc(data, subjgroup=explabels, listgroup=listlabels)
creates 2 serial position curves for each experiment; 1 for the first 8 lists and 1 for the second 8 lists
plot_spc(x, listgroup=listlabels)
combines across all experiments and plots 2 serial position curves (for the first vs. last 8 lists). If we had included subjgroup, more curves would have been plotted
To implement spc
, I'm thinking we could have that function wrap another function (e.g. analyze
) that takes subjgroup
, listgroup
, and spc_helper
as arguments-- where spc_helper
computes a serial position curve for a pyro object, and analyze
breaks down the pyro object by subjgroup
and listgroup
and calls spc_helper
for each chunk of the data.
(And similarly for plot_spc
, which would internally call a general plot
function that takes as arguments subjgroup
, listgroup
, and spc_plot_helper
, where spc_plot_helper
adds a serial position curve to the current axes, and plot
calls spc_plot_helper
for each chunk of data (divided by subjgroup and listgroup).
Then another function, say crp
would also call analyze
, but instead of spc_helper
, crp
would wrap analyze
and pass in crp_helper
, which computes a CRP curve for a chunk of data.
Does this make sense?
yup makes sense and sounds like a great structure.
@KirstensGitHub on board with this?
hey, sorry, was just reading through everything slowly. all sounds good! is the reasoning for passing the metadata to the analysis/plotting functions rather than including in the object simply because these are the functions that actually need/use it?
@KirstensGitHub -- yeah, exactly. the indices should be passed only at the stage they are actually used. this is a more flexible design than including them in the pyro object directly. for example, suppose we wanted to create three sets of plots:
1.) plot a curve for each experiment (combining across lists) 2.) plot a curve for each list (combining across experiments) 3.) plot a curve for each list/experiment (combining across subjects within each experiment)
with the design i've proposed above, you can use the same pyro object and simply change the subjgroup
and listgroup
flags to do each analysis. if the flags had been incorporated into the pyro object directly, we'd need to maintain 3 copies of the pyro object.
ah, that makes a lot of sense :) I was thinking over the weekend that it might potentially make sense to include the recall_matrix in the object (though this wasn't part of @andrewheusser 's original outline for the object) since I think it gets used for basically all of the subsequent analyses.. is that reasonable or should the objet be purely raw subject data?
I think the cleanest implementation would be to have the pyro object contain only the subject data (processed into a convenient format). we should have a helper function to compute a recalls matrix from a pyro object, but we should follow the same philosophy as the indexing-- to the extent possible, we should only include an argument or do an analyses at the stage it's needed.
to expand on the logic, we may have analyses or plots that don't require the recalls matrix, so we don't need to have the recalls matrix be part of the data representation.
on the other hand, we do want a generally accessible function for creating the recall matrices, since it'll be called by many of the analyses.
note: a potential exception to the rule "wait until the last possible moment to pass in an argument or do a computation" is for computations that take a long time to carry out. for those, it makes sense to do whatever possible to ensure they are carried out as few times as possible. but for the sorts of analyses we're discussing, everything is going to be super quick and we don't need to worry much about compute times.
def spc(pyro, listlabels=None, subjlabels=None):
'''run serial position analysis'''
def analyze(data=pyro, listlabels=listlabels, subjlabels=subjlabels, analysis=analysis_function):
'''performs averaging according to list labels and subj labels'''
def spc_helper(pres_for_a_single_list, rec_for_a_single_list):
'''computes spc for a single list'''
@jeremymanning just to confirm, is this what you had in mind?
Yes... But can we call them subjgroup and listgroup?
oops! typo - yep no prob
thanks @andrewheusser for clarifying, looks good :)
@jeremymanning @KirstensGitHub can I get your feedback on this? This is currently how you use the listgroup
/subjgroup
args. The data here is a group of 6 subjects. Default behavior is to plot the average over the group:
spc = pyr.spc(pyro, listgroup=['average']*16)
pyr.plot(spc)
Or if its split by list, it will split during plotting:
spc = pyr.spc(pyro, listgroup=['early']*8+['late']*8)
pyr.plot(spc)
If plot_type
is set to subject
, it will plot a line for each subject:
pyr.plot(spc, plot_type='subject')
and you can group by subject using the subjgroup
arg:
pyr.plot(spc, plot_type='subject', subjgroup=['exp1','exp1','exp2','exp2','exp3','exp3'])
Finally, to split on both variables, you can pass plot_type='grid'
, which will separately by list and subject group:
pyr.plot(spc, plot_type='grid', subjgroup=['exp1','exp1','exp2','exp2','exp3','exp3'])
Let me know if you think this works, or if you'd like to see any changes. Sidenote - There is currently 1 plot function that is aware of the analysis type and makes changes accordingly. We could split this into multiple plotting functions, but didn't seem necessary at this point.
Hmm, looking at these, I guess it would make more sense to plot together by subject in the last case and plot separately by subject when we're looking at the average over all lists for each subject....
@KirstensGitHub not sure what you mean exactly, can you elaborate?
@KirstensGitHub are you referring to plot_type='grid' vs. plot_type='subject'? my understanding is that the plot type is a user option that you can set using the plot_type flag...?
@andrewheusser as a very minor change, I'd change the words "Subject" and "List" plot to be "Subject group" and "List group" when the subjgroup and listgroup flags are used, respectively. I'd also change the "|" (pipe) to a ", " (comma + space) for the titles in the "grid" plot at the end.
ah, so pyr.plot(spc, plot_type='subject')
would give each individual subject in a separate plot using facet grid?
@KirstensGitHub no...i think pyr.plot(spc, plot_type='subject')
would give each subject (or subject group) a curve on a single plot, whereas pyr.plot(spc, plot_type='grid')
would have each subject (or subject group) in it's own plot, with all the plots arranged in a grid. (is this right @andrewheusser?)
yep, thats correct
oops! I meant to say grid
instead of subject
. anyways, got it, thanks
👌
To facilitate doing analyses separated by experiment and list number, I propose adding two flags:
subjgroup
provides a per-subject group label (e.g. labeling which experiment each subject participated in). analysis functions carry out analyses separately for each group, and plotting functions show a different curve for each group (e.g. in a different color). for example, this will allow us to easily do analyses for different experiments, in a single command.listgroup
provides a per-list label (independent of subjects). analysis functions carry out analyses separately for each listgroup (in addition to each subjgroup), and plotting functions show a different curve for each listgroup (e.g. in a different color). for example, this will allow us to easily do analyses for the first 8 vs last 8 lists, see how memory fingerprints change over the course of the experiment, and other similar sorts of things, also in a single command.The way I'm imagining this would be implemented is to first divide the data based on subjgroup, passing in the listgroup flags to the function that carries out analyses for each subject. Then inside the subject-level analysis function, analyses would be performed separately on each listgroup.
An efficient way to set this up would be to have "general" analysis function that takes in a pyro object and an (analysis) function handle. The general analysis function would then have an outer loop over subjgroup and an inner loop over the listgroups for the current subjgroup, and would call the analysis function handle for actually doing the analysis on that subjgroup-listgroup piece of the data and then aggregrate the results in some way (e.g. by returning a new dataframe or pyro object).
Plotting could work similarly-- we could have a general plotting function that takes in a results object and a (plot) function handle and loops over subjgroup and listgroup, adding each piece of the data in the innermost loop (listgroup of the subjgroup) to the current plot (or aggregating the results in a way that could easily be plotted at the end).