Closed andrewheusser closed 6 years ago
It's coming along! Comments:
next_features
is an empty list, or a list of empty dictionaries? We should leave features empty by default if none are specified. Similarly, we should have a default distance function of Euclidean distance if none is specified.eval
-- which is generally a "dangerous" function to use). Using function objects will allow users to easily specify much more complex functions, e.g. as defined in many lines or imported from other libraries. Technically users could always define a wrapper around the function they wanted (e.g. 'lambda a, b: my_complex_function(a, b)'), but this is clunkier than needed.
quail.dist
) such as:quail.dist.match = lambda a, b: int(a!=b)
quail.distance.cdist = lambda a, b, s: scipy.spatial.distance.cdist(a, b, metric=s)
(hide this one and/or separate it out in the documentation?)quail.dist.correlation = lambda a, b: quail.distance.cdist(a, b, 'correlation')
quail.dist.euclidean = lambda a, b: quail.distance.cdist(a, b, 'euclidean')
dist_funcs
example definition using those built-in functions (without re-defining the same lambda function several times)recmat2egg
, I still think it'll be cleaner to allow Eggs to be initialized using recall matrices (e.g. by wrapping recmat2egg
). We could include a list_length
keyword argument in the Egg init function that is ignored unless we're in the "recall matrix initialization" case. The reason for not having separate functions is that it will (eventually) allow us to flexibly and seamlessly work with multiple data formats, all within the same set of function calls. Separating functions that do similar things (that the user then has to keep track of) is going to make our lives more complicated later on, and will make the syntax clunkier for users.Thanks!! A quick note: The reason I am storing the lambda functions as text and then evaluating is because of our new saved file format. If I save an egg with lambda functions, they are automatically pickled, which partially defeats the purpose of saving the eggs in the HDF5 format.
I attempted to solve this with the inspect package to try to extract the source code from the lambda function, the source lines were variable depending upon the code used to define the function. This made it tricky to extract the text. The one way I could think to solve this is with a regular expression that searches for the lambda keyword in the source and then snips it out to save it. Happy to spend some more time on solving this if you think its worthwhile
Could we potentially use inspect
? Something like this example: https://stackoverflow.com/questions/30983875/how-do-you-read-a-lambda-function-as-a-string
Good find, thanks! I think this will work. But note that I will still have to use eval to convert them back into functions when they are loaded back in
_remove quail/.DSStore ✔️
_add *.DSStore to .gitignore ✔️
_in create_egg example, what happens if nextfeatures is an empty list, or a list of empty dictionaries? We should leave features empty by default if none are specified. Similarly, we should have a default distance function of Euclidean distance if none is specified.
features is None by default. If next_features is an empty list, it would need to be the same size as pres or else it will throw an error. If its a list of empty dictionaries (the same dims as pres), then it will run but no features will be added to pres. If no distance metrics are specified, quail will add defaults: lambda a, b: int(a!=b)
if the feature is a string and lambda a, b: np.linalg.norm(np.subtract(a,b))
if the feature is a number.
_dist_funcs: I think the values should be actual function objects, not strings that define functions (which then get unpacked via eval-- which is generally a "dangerous" function to use). Using function objects will allow users to easily specify much more complex functions, e.g. as defined in many lines or imported from other libraries. Technically users could always define a wrapper around the function they wanted (e.g. 'lambda a, b: my_complexfunction(a, b)'), but this is clunkier than needed. i can use inspect to get around this issue, but will still need to use eval when loading the eggs back in from disk
We could define a set of built-in distance function (perhaps under quail.dist) such as: quail.dist.match = lambda a, b: int(a!=b) quail.distance.cdist = lambda a, b, s: scipy.spatial.distance.cdist(a, b, metric=s) (hide this one and/or separate it out in the documentation?) quail.dist.correlation = lambda a, b: quail.distance.cdist(a, b, 'correlation') quail.dist.euclidean = lambda a, b: quail.distance.cdist(a, b, 'euclidean') (can add similar support for all functions supported by cdist) etc. (although i think these two cover all of the distance functions we're using and/or have been discussing-- e.g. absolute difference is a special case of euclidean distance where n dimensions = 1) Sounds good, i'll make an issue for this!
_Instead of having a separate function recmat2egg, I still think it'll be cleaner to allow Eggs to be initialized using recall matrices (e.g. by wrapping recmat2egg). We could include a listlength keyword argument in the Egg init function that is ignored unless we're in the "recall matrix initialization" case. The reason for not having separate functions is that it will (eventually) allow us to flexibly and seamlessly work with multiple data formats, all within the same set of function calls. Separating functions that do similar things (that the user then has to keep track of) is going to make our lives more complicated later on, and will make the syntax clunkier for users. ✔️
I like the "FriedEgg" name :) :partyandy:
What's the point of df2list? Should it be exposed to users? this function turns a multilevel df (subjects, list) into a list(list(words)). This could be useful to users, so i'll document it and expose it!
@jeremymanning this is almost ready to be merged. The one outstanding issue is how to store the egg distance functions. I am currently storing them as strings so that they don't have to be pickled when they are saved. I can change this so that they are lambda functions in memory, but then converted to strings when they are saved out.
One issue I am foreseeing with this is with custom distance functions that rely on packages not included in quail. If a custom function is created and then saved out as a string, when it is loaded back in, the package may not be there to recreate the function, so this will probably break.
I guess another option would be to just let the lambda functions be pickled, but specify a backwards compatible pickle format. However, I was hoping to avoid pickling given all the trouble we've been having with them...
Suppose we (for now) table support for the custom distance functions-- could we then support function pickling?
Note: if we go with pickling, we should make sure that support saving/loading from Python 2/3 interchangeably, e.g. using these guidelines
we can pickle the lambda functions...but I'd rather avoid pickling anything if possible. that's why I am currently storing them as string and then using eval to turn them back into functions. but i understand that is a potentially flawed approach as well
i think our options are:
1) turn the lambda functions in strings when they are saved, and then turn them back into functions upon reloading (or at runtime)
2) save the lambda functions as pickled objects and try to minimize python 2/3 incompatibilities as much as possible.
note: the reason why we moved to using deepdish to save the files was to get away from using pickle
i think our options are:
turn the lambda functions in strings when they are saved, and then turn them back into functions upon reloading (or at runtime)
save the lambda functions as pickled objects and try to minimize python 2/3 incompatibilities as much as possible.
let's go with option 2 (pickles!)-- it'll allow us to support a broader range of functions in the long term.
what do you think about this:
this approach would support flexible distance functions, but also minimize issues for regular users
what about just supporting strings (but not lambda functions)? then the name of the function could get stored in the egg (to be filled in by whatever built-in function that string corresponds to when it's used). in other words, to avoid making this too complicated, we could just support built-in functions for the time being.
ah, you mean like match
, correlation
, euclidean
etc? That sounds good to me
I replaced the lambda functions with strings and implemented match, euclidean and correlation for now.
I think this is ready to merge now, so once you give me the go ahead, I'll fix those merge conflicts listed above!
🎉
This PR does the following:
quail.analyze
)load_egg
function to automatically update to the latest egg formatFriedEgg
that contains processed data (an analyzed egg), some parameters and a plot method (which wrapsquail.plot()
)the one thing I omitted was support for a recall matrix to be passed directly to
Egg
. This is because it also requires another parameter (list_length) and I thought it would make theEgg
API more confusing. Instead, I leftrecmat2egg
as a helper function which generates anEgg
given a recall matrix. Another thing that this PR does NOT do is support 'naturalistic' stimuli. I plan to do that in a separate PR once this one is merged.@jeremymanning would love to get your feedback when you have a chance, thanks!!