ContextLab / quail

A python toolbox for analyzing and plotting free recall data
http://cdl-quail.readthedocs.io/en/latest/
MIT License
20 stars 10 forks source link

Enhanced egg and new results class #90

Closed andrewheusser closed 6 years ago

andrewheusser commented 6 years ago

This PR does the following:

the one thing I omitted was support for a recall matrix to be passed directly to Egg. This is because it also requires another parameter (list_length) and I thought it would make the Egg API more confusing. Instead, I left recmat2egg as a helper function which generates an Egg given a recall matrix. Another thing that this PR does NOT do is support 'naturalistic' stimuli. I plan to do that in a separate PR once this one is merged.

@jeremymanning would love to get your feedback when you have a chance, thanks!!

jeremymanning commented 6 years ago

It's coming along! Comments:

andrewheusser commented 6 years ago

Thanks!! A quick note: The reason I am storing the lambda functions as text and then evaluating is because of our new saved file format. If I save an egg with lambda functions, they are automatically pickled, which partially defeats the purpose of saving the eggs in the HDF5 format.

I attempted to solve this with the inspect package to try to extract the source code from the lambda function, the source lines were variable depending upon the code used to define the function. This made it tricky to extract the text. The one way I could think to solve this is with a regular expression that searches for the lambda keyword in the source and then snips it out to save it. Happy to spend some more time on solving this if you think its worthwhile

jeremymanning commented 6 years ago

Could we potentially use inspect? Something like this example: https://stackoverflow.com/questions/30983875/how-do-you-read-a-lambda-function-as-a-string

andrewheusser commented 6 years ago

Good find, thanks! I think this will work. But note that I will still have to use eval to convert them back into functions when they are loaded back in

andrewheusser commented 6 years ago

_remove quail/.DSStore ✔️

_add *.DSStore to .gitignore ✔️

_in create_egg example, what happens if nextfeatures is an empty list, or a list of empty dictionaries? We should leave features empty by default if none are specified. Similarly, we should have a default distance function of Euclidean distance if none is specified. features is None by default. If next_features is an empty list, it would need to be the same size as pres or else it will throw an error. If its a list of empty dictionaries (the same dims as pres), then it will run but no features will be added to pres. If no distance metrics are specified, quail will add defaults: lambda a, b: int(a!=b) if the feature is a string and lambda a, b: np.linalg.norm(np.subtract(a,b)) if the feature is a number.

_dist_funcs: I think the values should be actual function objects, not strings that define functions (which then get unpacked via eval-- which is generally a "dangerous" function to use). Using function objects will allow users to easily specify much more complex functions, e.g. as defined in many lines or imported from other libraries. Technically users could always define a wrapper around the function they wanted (e.g. 'lambda a, b: my_complexfunction(a, b)'), but this is clunkier than needed. i can use inspect to get around this issue, but will still need to use eval when loading the eggs back in from disk

We could define a set of built-in distance function (perhaps under quail.dist) such as: quail.dist.match = lambda a, b: int(a!=b) quail.distance.cdist = lambda a, b, s: scipy.spatial.distance.cdist(a, b, metric=s) (hide this one and/or separate it out in the documentation?) quail.dist.correlation = lambda a, b: quail.distance.cdist(a, b, 'correlation') quail.dist.euclidean = lambda a, b: quail.distance.cdist(a, b, 'euclidean') (can add similar support for all functions supported by cdist) etc. (although i think these two cover all of the distance functions we're using and/or have been discussing-- e.g. absolute difference is a special case of euclidean distance where n dimensions = 1) Sounds good, i'll make an issue for this!

_Instead of having a separate function recmat2egg, I still think it'll be cleaner to allow Eggs to be initialized using recall matrices (e.g. by wrapping recmat2egg). We could include a listlength keyword argument in the Egg init function that is ignored unless we're in the "recall matrix initialization" case. The reason for not having separate functions is that it will (eventually) allow us to flexibly and seamlessly work with multiple data formats, all within the same set of function calls. Separating functions that do similar things (that the user then has to keep track of) is going to make our lives more complicated later on, and will make the syntax clunkier for users. ✔️

I like the "FriedEgg" name :) :partyandy:

What's the point of df2list? Should it be exposed to users? this function turns a multilevel df (subjects, list) into a list(list(words)). This could be useful to users, so i'll document it and expose it!

andrewheusser commented 6 years ago

@jeremymanning this is almost ready to be merged. The one outstanding issue is how to store the egg distance functions. I am currently storing them as strings so that they don't have to be pickled when they are saved. I can change this so that they are lambda functions in memory, but then converted to strings when they are saved out.

One issue I am foreseeing with this is with custom distance functions that rely on packages not included in quail. If a custom function is created and then saved out as a string, when it is loaded back in, the package may not be there to recreate the function, so this will probably break.

I guess another option would be to just let the lambda functions be pickled, but specify a backwards compatible pickle format. However, I was hoping to avoid pickling given all the trouble we've been having with them...

jeremymanning commented 6 years ago

Suppose we (for now) table support for the custom distance functions-- could we then support function pickling?

Note: if we go with pickling, we should make sure that support saving/loading from Python 2/3 interchangeably, e.g. using these guidelines

andrewheusser commented 6 years ago

we can pickle the lambda functions...but I'd rather avoid pickling anything if possible. that's why I am currently storing them as string and then using eval to turn them back into functions. but i understand that is a potentially flawed approach as well

i think our options are:

1) turn the lambda functions in strings when they are saved, and then turn them back into functions upon reloading (or at runtime)

2) save the lambda functions as pickled objects and try to minimize python 2/3 incompatibilities as much as possible.

andrewheusser commented 6 years ago

note: the reason why we moved to using deepdish to save the files was to get away from using pickle

jeremymanning commented 6 years ago

i think our options are:

  1. turn the lambda functions in strings when they are saved, and then turn them back into functions upon reloading (or at runtime)

  2. save the lambda functions as pickled objects and try to minimize python 2/3 incompatibilities as much as possible.

let's go with option 2 (pickles!)-- it'll allow us to support a broader range of functions in the long term.

andrewheusser commented 6 years ago

what do you think about this:

  1. store everything as a lambda function in memory
  2. when saving, look up whether the dist func is built in. if it is save it as a string. if not, save it as a pickle and throw a warning about potential python2/3 compatibility issues
  3. when loading back in, if the dist func is a string, replace the string with the associated dist func

this approach would support flexible distance functions, but also minimize issues for regular users

jeremymanning commented 6 years ago

what about just supporting strings (but not lambda functions)? then the name of the function could get stored in the egg (to be filled in by whatever built-in function that string corresponds to when it's used). in other words, to avoid making this too complicated, we could just support built-in functions for the time being.

andrewheusser commented 6 years ago

ah, you mean like match, correlation, euclidean etc? That sounds good to me

andrewheusser commented 6 years ago

I replaced the lambda functions with strings and implemented match, euclidean and correlation for now.

I think this is ready to merge now, so once you give me the go ahead, I'll fix those merge conflicts listed above!

jeremymanning commented 6 years ago

🎉