Closed andrewheusser closed 7 years ago
@jeremymanning do you think we should keep the pyfingerprint codebase separate, or fully integrate it into pyrec?
i think clustering analyses (fingerprints) should go in pyrec, not be separate
currently, i have our features (and their distance metrics) hard coded into pyfingerprint. However, if we want this to be flexible, a better model might be to let the user specify their own unique features and distance metrics...I am envisioning the features as a dataframe attached to the pyro object:
features : pd.DataFrame
Dataframe containing the features for presented words. Each row represents the presented words for a given list and each column
represents a list. The cells should be a dictionary of features, where the keys are the name of the features, and the values are the feature values.
The index will be a multi-index, where the first level reprensents the subject number and the second level represents the list number
but then there should maybe be another field that specifies the distance metrics for each feature? @jeremymanning any thoughts on how we should set this up?
i like keeping things general; this sounds like a good design to me.
in terms of distance metrics, we could default to
we can also let users pass in different distance functions of the form dist(arg1, arg2)
(must return a scalar value reflecting the distance between arg1
and arg2
).
ok that sounds good - the way im planning to implement the 'custom' functions is a an optional dictionary passed when creating the pyro object:
dist_funcs={}
dist_funcs['category']=lambda x,y: abs(x-y)
pyro = Pyro(pres=pres_data, rec=rec_data, features=features_data, dist_funcs=dist_funcs)
in this example, the 'category' feature will use the custom distance function, and then rest will use the default distance metric. sound good?
integrate fingerprint analyses into pyrec. The code already exists in another repo. We can A) keep the repo separate and import the package and a dependency, or B) integrate it into pyrec and get rid of the other repo. thoughts?