CamDavidsonPilon / lifetimes

Lifetime value in Python
MIT License
1.45k stars 373 forks source link

Not Implemented - TypeError: unsupported operand type(s) for +: 'int' and 'str' #436

Closed SSMK-wq closed 1 year ago

SSMK-wq commented 1 year ago

I used the lifetimes utils function to get the summary data from my dataset but I got the below error

My code looked like below

clv = lifetimes.utils.summary_data_from_transaction_data(df_new,'unique_key','Resale Date','Revenue Resale EUR',observation_period_end='2022-05-22')


NotImplementedError Traceback (most recent call last) File ~\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py:1578, in GroupBy._cython_agg_general..array_func(values) 1577 try: -> 1578 result = self.grouper._cython_operation( 1579 "aggregate", values, how, axis=data.ndim - 1, min_count=min_count 1580 ) 1581 except NotImplementedError: 1582 # generally if we have numeric_only=False 1583 # and non-applicable functions 1584 # try to python agg 1585 # TODO: shouldn't min_count matter?

File ~\Anaconda3\lib\site-packages\pandas\core\groupby\ops.py:939, in BaseGrouper._cython_operation(self, kind, values, how, axis, min_count, kwargs) 938 ngroups = self.ngroups --> 939 return cy_op.cython_operation( 940 values=values, 941 axis=axis, 942 min_count=min_count, 943 comp_ids=ids, 944 ngroups=ngroups, 945 kwargs, 946 )

File ~\Anaconda3\lib\site-packages\pandas\core\groupby\ops.py:626, in WrappedCythonOp.cython_operation(self, values, axis, min_count, comp_ids, ngroups, kwargs) 618 return self._ea_wrap_cython_operation( 619 values, 620 min_count=min_count, (...) 623 kwargs, 624 ) --> 626 return self._cython_op_ndim_compat( 627 values, 628 min_count=min_count, 629 ngroups=ngroups, 630 comp_ids=comp_ids, 631 mask=None, 632 **kwargs, 633 )

File ~\Anaconda3\lib\site-packages\pandas\core\groupby\ops.py:451, in WrappedCythonOp._cython_op_ndim_compat(self, values, min_count, ngroups, comp_ids, mask, result_mask, kwargs) 450 result_mask = result_mask[None, :] --> 451 res = self._call_cython_op( 452 values2d, 453 min_count=min_count, 454 ngroups=ngroups, 455 comp_ids=comp_ids, 456 mask=mask, 457 result_mask=result_mask, 458 kwargs, 459 ) 460 if res.shape[0] == 1:

File ~\Anaconda3\lib\site-packages\pandas\core\groupby\ops.py:516, in WrappedCythonOp._call_cython_op(self, values, min_count, ngroups, comp_ids, mask, result_mask, **kwargs) 515 out_shape = self._get_output_shape(ngroups, values) --> 516 func, values = self.get_cython_func_and_vals(values, is_numeric) 517 out_dtype = self.get_out_dtype(values.dtype)

File ~\Anaconda3\lib\site-packages\pandas\core\groupby\ops.py:199, in WrappedCythonOp.get_cython_func_and_vals(self, values, is_numeric) 197 return func, values --> 199 func = self._get_cython_function(kind, how, values.dtype, is_numeric) 201 if values.dtype.kind in ["i", "u"]:

File ~\Anaconda3\lib\site-packages\pandas\core\groupby\ops.py:164, in WrappedCythonOp._get_cython_function(cls, kind, how, dtype, is_numeric) 162 if "object" not in f.signatures: 163 # raise NotImplementedError here rather than TypeError later --> 164 raise NotImplementedError( 165 f"function is not implemented for this dtype: " 166 f"[how->{how},dtype->{dtype_str}]" 167 ) 168 return f

NotImplementedError: function is not implemented for this dtype: [how->mean,dtype->object]

During handling of the above exception, another exception occurred:

TypeError Traceback (most recent call last) Input In [27], in <cell line: 1>() ----> 1 clv = lifetimes.utils.summary_data_from_transaction_data(df_new,'unique_key','Resale Date','Revenue Resale EUR',observation_period_end='2022-05-22')

File ~\Anaconda3\lib\site-packages\lifetimes\utils.py:323, in summary_data_from_transaction_data(transactions, customer_id_col, datetime_col, monetary_value_col, datetime_format, observation_period_end, freq, freq_multiplier, include_first_transaction) 319 # by setting the monetary_value cells of all the first purchases to NaN, 320 # those values will be excluded from the mean value calculation 321 repeated_transactions.loc[first_purchases, monetary_value_col] = np.nan 322 customers["monetary_value"] = ( --> 323 repeated_transactions.groupby(customer_id_col)[monetary_value_col].mean().fillna(0) 324 ) 325 summary_columns.append("monetary_value") 327 return customers[summary_columns].astype(float)

File ~\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py:1956, in GroupBy.mean(self, numeric_only, engine, engine_kwargs) 1954 return self._numba_agg_general(sliding_mean, engine_kwargs, "groupby_mean") 1955 else: -> 1956 result = self._cython_agg_general( 1957 "mean", 1958 alt=lambda x: Series(x).mean(numeric_only=numeric_only_bool), 1959 numeric_only=numeric_only_bool, 1960 ) 1961 return result.finalize(self.obj, method="groupby")

File ~\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py:1592, in GroupBy._cython_agg_general(self, how, alt, numeric_only, min_count) 1588 return result 1590 # TypeError -> we may have an exception in trying to aggregate 1591 # continue and exclude the block -> 1592 new_mgr = data.grouped_reduce(array_func, ignore_failures=True) 1594 if not is_ser and len(new_mgr) < len(data): 1595 warn_dropping_nuisance_columns_deprecated(type(self), how)

File ~\Anaconda3\lib\site-packages\pandas\core\internals\base.py:199, in SingleDataManager.grouped_reduce(self, func, ignore_failures) 193 """ 194 ignore_failures : bool, default False 195 Not used; for compatibility with ArrayManager/BlockManager. 196 """ 198 arr = self.array --> 199 res = func(arr) 200 index = default_index(len(res)) 202 mgr = type(self).from_array(res, index)

File ~\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py:1586, in GroupBy._cython_agg_general..array_func(values) 1578 result = self.grouper._cython_operation( 1579 "aggregate", values, how, axis=data.ndim - 1, min_count=min_count 1580 ) 1581 except NotImplementedError: 1582 # generally if we have numeric_only=False 1583 # and non-applicable functions 1584 # try to python agg 1585 # TODO: shouldn't min_count matter? -> 1586 result = self._agg_py_fallback(values, ndim=data.ndim, alt=alt) 1588 return result

File ~\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py:1540, in GroupBy._agg_py_fallback(self, values, ndim, alt) 1535 ser = df.iloc[:, 0] 1537 # We do not get here with UDFs, so we know that our dtype 1538 # should always be preserved by the implemented aggregations 1539 # TODO: Is this exactly right; see WrappedCythonOp get_result_dtype? -> 1540 res_values = self.grouper.agg_series(ser, alt, preserve_dtype=True) 1542 if isinstance(values, Categorical): 1543 # Because we only get here with known dtype-preserving 1544 # reductions, we cast back to Categorical. 1545 # TODO: if we ever get "rank" working, exclude it here. 1546 res_values = type(values)._from_sequence(res_values, dtype=values.dtype)

File ~\Anaconda3\lib\site-packages\pandas\core\groupby\ops.py:981, in BaseGrouper.agg_series(self, obj, func, preserve_dtype) 978 preserve_dtype = True 980 else: --> 981 result = self._aggregate_series_pure_python(obj, func) 983 npvalues = lib.maybe_convert_objects(result, try_float=False) 984 if preserve_dtype:

File ~\Anaconda3\lib\site-packages\pandas\core\groupby\ops.py:1005, in BaseGrouper._aggregate_series_pure_python(self, obj, func) 1003 for i, group in enumerate(splitter): 1004 group = group.finalize(obj, method="groupby") -> 1005 res = func(group) 1006 res = libreduction.extract_result(res) 1008 if not initialized: 1009 # We only do this validation on the first iteration

File ~\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py:1958, in GroupBy.mean..(x) 1954 return self._numba_agg_general(sliding_mean, engine_kwargs, "groupby_mean") 1955 else: 1956 result = self._cython_agg_general( 1957 "mean", -> 1958 alt=lambda x: Series(x).mean(numeric_only=numeric_only_bool), 1959 numeric_only=numeric_only_bool, 1960 ) 1961 return result.finalize(self.obj, method="groupby")

File ~\Anaconda3\lib\site-packages\pandas\core\generic.py:11117, in NDFrame._add_numeric_operations..mean(self, axis, skipna, level, numeric_only, kwargs) 11099 @doc( 11100 _num_doc, 11101 desc="Return the mean of the values over the requested axis.", (...) 11115 kwargs, 11116 ):

11117 return NDFrame.mean(self, axis, skipna, level, numeric_only, **kwargs)

File ~\Anaconda3\lib\site-packages\pandas\core\generic.py:10687, in NDFrame.mean(self, axis, skipna, level, numeric_only, kwargs) 10679 def mean( 10680 self, 10681 axis: Axis | None | lib.NoDefault = lib.no_default, (...) 10685 kwargs, 10686 ) -> Series | float:

10687 return self._stat_function( 10688 "mean", nanops.nanmean, axis, skipna, level, numeric_only, **kwargs 10689 )

File ~\Anaconda3\lib\site-packages\pandas\core\generic.py:10639, in NDFrame._stat_function(self, name, func, axis, skipna, level, numeric_only, **kwargs) 10629 warnings.warn( 10630 "Using the level keyword in DataFrame and Series aggregations is " 10631 "deprecated and will be removed in a future version. Use groupby " (...) 10634 stacklevel=find_stack_level(), 10635 ) 10636 return self._agg_by_level( 10637 name, axis=axis, level=level, skipna=skipna, numeric_only=numeric_only 10638 )

10639 return self._reduce( 10640 func, name=name, axis=axis, skipna=skipna, numeric_only=numeric_only 10641 )

File ~\Anaconda3\lib\site-packages\pandas\core\series.py:4471, in Series._reduce(self, op, name, axis, skipna, numeric_only, filter_type, kwds) 4467 raise NotImplementedError( 4468 f"Series.{name} does not implement {kwd_name}." 4469 ) 4470 with np.errstate(all="ignore"): -> 4471 return op(delegate, skipna=skipna, kwds)

File ~\Anaconda3\lib\site-packages\pandas\core\nanops.py:93, in disallow.call.._f(*args, *kwargs) 91 try: 92 with np.errstate(invalid="ignore"): ---> 93 return f(args, **kwargs) 94 except ValueError as e: 95 # we want to transform an object array 96 # ValueError message to the more typical TypeError 97 # e.g. this is normally a disallowed function on 98 # object arrays that contain strings 99 if is_object_dtype(args[0]):

File ~\Anaconda3\lib\site-packages\pandas\core\nanops.py:155, in bottleneck_switch.call..f(values, axis, skipna, kwds) 153 result = alt(values, axis=axis, skipna=skipna, kwds) 154 else: --> 155 result = alt(values, axis=axis, skipna=skipna, **kwds) 157 return result

File ~\Anaconda3\lib\site-packages\pandas\core\nanops.py:410, in _datetimelike_compat..new_func(values, axis, skipna, mask, kwargs) 407 if datetimelike and mask is None: 408 mask = isna(values) --> 410 result = func(values, axis=axis, skipna=skipna, mask=mask, kwargs) 412 if datetimelike: 413 result = _wrap_results(result, orig_values.dtype, fill_value=iNaT)

File ~\Anaconda3\lib\site-packages\pandas\core\nanops.py:698, in nanmean(values, axis, skipna, mask) 695 dtype_count = dtype 697 count = _get_counts(values.shape, mask, axis, dtype=dtype_count) --> 698 the_sum = _ensure_numeric(values.sum(axis, dtype=dtype_sum)) 700 if axis is not None and getattr(the_sum, "ndim", False): 701 count = cast(np.ndarray, count)

File ~\Anaconda3\lib\site-packages\numpy\core_methods.py:48, in _sum(a, axis, dtype, out, keepdims, initial, where) 46 def _sum(a, axis=None, dtype=None, out=None, keepdims=False, 47 initial=_NoValue, where=True): ---> 48 return umr_sum(a, axis, dtype, out, keepdims, initial, where)

TypeError: unsupported operand type(s) for +: 'int' and 'str'

ColtAllen commented 1 year ago

Hey @SSMK-wq,

Do you mind sharing what the resolution was for this issue? I'll be refactoring this function in a future btyd release, and it seems like a validation check in the right place could clean up this error message considerably.

SSMK-wq commented 1 year ago

My customer id though unique was in string format. So, i did astype(int64) and it worked fine.

On Mon, 31 Oct 2022, 23:02 Colt Allen, @.***> wrote:

Hey @SSMK-wq https://github.com/SSMK-wq,

Do you mind sharing what the resolution was for this issue? I'll be refactoring this function in a future btyd https://github.com/ColtAllen/btyd release, and it seems like a validation check in the right place could clean up this error message considerably.

— Reply to this email directly, view it on GitHub https://github.com/CamDavidsonPilon/lifetimes/issues/436#issuecomment-1297223885, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHKM54PLYO3PK6IFAUXGZVLWF7NR7ANCNFSM6AAAAAARS3U3IA . You are receiving this because you were mentioned.Message ID: @.***>

ColtAllen commented 1 year ago

Thanks; the functions as written should've been able to accept a string type for the Customer ID column. I wonder if that column was actually a Pandas Object.

SSMK-wq commented 1 year ago

Yes. It was a pandas object

On Tue, 1 Nov 2022, 02:38 Colt Allen, @.***> wrote:

Thanks; the functions as written should've been able to accept a string type for the Customer ID column. I wonder if that column was actually a Pandas Object.

— Reply to this email directly, view it on GitHub https://github.com/CamDavidsonPilon/lifetimes/issues/436#issuecomment-1297511157, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHKM54OYNBZ5TALOQDTIUELWGAG2XANCNFSM6AAAAAARS3U3IA . You are receiving this because you were mentioned.Message ID: @.***>