Closed sam-lupton closed 2 years ago
I am wondering the same exact thing. @sam-lupton, did you ever find a solution? Thank you!
@nicknelson1021 I went with a fairly hacky solution of replacing that .index attribute on line 489 of utils.py with something that doesn't depend on pandas objects (e.g. range(len(frequency)),
and then calling all subsequent methods in my actual script with .values for the arguments (e.g. summary['recency'].values
), as the methods work fine when you use ndarrays rather than pandas objects.
I did this all in the local package file mind you, because I wasn't sure it's a good enough fix to contribute. Ideally you'd make all the functions work with both pandas objects and ndarrays!
On reflection though, the current version just randomly takes the index of the frequency argument (rather than one of the other 3 array-likes passed in), giving no indication to the user that that is the case, so anything would be better than that...
Issue solved with this PR
The input type for the
frequency
for _customer_lifetime_value is given as an array_like. This isn't quite true, as frequency.index is used to instantiate a DataFrame in line one of the method here. This means only pandas Series/DFs can be used as the frequency.Unfortunately, if you do use pandas Series as all of the arguments, you often get a pandas NotImplemented error due to a supposed mixing of DataFrame and Series types. I will link the traceback of one such example below, where a GammaGamma model was fitted and called using a fitted ParetoNBD model. Could this be fixed, or some workarounds provided?