ContextLab / quail

A python toolbox for analyzing and plotting free recall data
http://cdl-quail.readthedocs.io/en/latest/
MIT License
20 stars 10 forks source link

Analysis object class #82

Closed andrewheusser closed 5 years ago

andrewheusser commented 6 years ago

Currently, the output of any analysis is a dataframe with a field attached to it that specifies the type of analysis (e.g. accuracy, spc etc). This is how the plot function knows what to do with it. However, any manipulation of the dataframe that results in the creation of a new df (which is most pandas functions) drops the field from the dataframe and so the plot function can't handle the new df

For that reason (and others), I think we may want to create a new analysis result class, or alternatively, append the analysis result onto the egg in some nice way. For example, we could append an egg.analyses field that contains a dictionary of analyses that have been performed on the egg. For example:

egg.analyses # None
egg = quail.analyze(egg, analysis='accuracy') # attach analyses dictionary to egg
egg.analyses['accuracy'] # a dataframe containing the results of the accuracy analysis

then, we could attach the plot function to the egg as well:

egg.plot('accuracy') # plots accuracy if it exists, else throws error

similarly, the analyze function could be attached to the egg so we could perform complex analyses in one line:

egg.analyze('accuracy').plot('accuracy')
egg.analyze(['fingerprint', 'accuracy']).plot('accuracy')
jeremymanning commented 6 years ago

What about having the egg.analyze and egg.plot functions only work on "raw" eggs (i.e. get rid of the idea of an "analyzed" egg)?

So to make a plot, you'd run something like:

dataframe = egg.plot('accuracy')

This would take the egg, run egg.analyze('accuracy'), create a plot, and return a pandas dataframe that could be analyzed further outside of quail...but it would no longer be an Egg object.

Or if you just wanted that dataframe, you could run:

dataframe = egg.analyze('accuracy')

this would behave the same as the first command, but wouldn't produce a plot.

We could also incorporate some of the caching tricks you just added to hypertools, which would solve the issue of having to re-compute time-intensive analyses each time they were needed. (I think we'd want the cache to be attached to the egg object...? And I'm not sure if the cache should be saved or not.)

Finally, I like the idea of reducing complex analyses to a single line. I guess the output would then be a multi-level pandas dataframe (where at the highest level the columns were organized by analysis type)? I see this being most useful for caching, e.g. we could precompute all of the major analyses prior to saving out a new egg (note: we'd have to deal with combining caches gracefully when stacking/cracking eggs). Then all of the analyses on that object would run much faster.