Closed jeremymanning closed 7 years ago
would also be cool to support loading in remote datasets, e.g. from http://memory.psych.upenn.edu/Data_Archive
(someday)
cool idea, sort of like a pandas df. does everything we want to do start with the recall matrix? also, how do we want to handle multiple subjects? does the data object represent a group of subjects, or one?
I was thinking it could be a pandas dataframe set up as follows (a single dataframe can reflect data from 1 or more subjects):
each row is a single list and each column is a list position.
the row indices are used to aggregate the data by unique index-- e.g. all of the rows with the given index will be analyzed together by the analysis function and the result will be stored in a single output row.
the columns of recmat label the list position (for serial position, p(first recall), etc.) or the relative list position (lag CRP). we can also have non-numerical columns (e.g. dimensions for memory fingerprints).
analysis functions take in a recmat dataframe (with row indices potentially repeated), and group the rows by unique index. a new dataframe is returned with one row per unique index.
plotting functions take in a recmat dataframe and plot the average within each column (first averaging within each unique group as defined by the row indices-- e.g. there could be different numbers of observations within each group, but each should count equally in the final average). if the columns variable is numerical, we plot a line plot (plot the numbers in order, connect the dots for each group). both the individual plots for each group and the across-group average should be shown. if the columns variable is non-numerical, we show a bar plot instead (each group appears as a dot and the bar height reflects the average across groups).
we want the data structure to support (at least) the following scenarios:
sounds great. I think we'll want to set up a class, and then attach attributes (the data, other info) and methods (analysis and plotting functions) to it. Here is a little primer i found on this style of coding in python: https://jeffknupp.com/blog/2014/06/18/improve-your-python-python-classes-and-object-oriented-programming/
closing...open new issue to support remote datasets
i think we should define a data object (e.g. recalls matrix + some additional info). then we can have two types of functions:
we could depend on hypertools for plotting trajectories and seaborn for other stuff