Closed SSMK-wq closed 1 year ago
Hey @SSMK-wq, if your IDE is anything like mine, GammaGammaFitter
is only showing up in auto-populate because it is already used elsewhere in your script or notebook. GammaGammaModel
is the newer version of this model. All models suffixed with Fitter
will be removed after BTYD is out of beta.
I do see a typo to fix in the import statement though, so thanks for bringing this to my attention.
So, would we able to use 'GammaGammaModel' now? It doesn't work in import statement. So, it is not available in beta version now?
On Wed, 9 Nov 2022, 21:43 Colt Allen, @.***> wrote:
Hey @SSMK-wq https://github.com/SSMK-wq, if your IDE is anything like mine, GammaGammaFitter is only showing up in auto-populate because it is already used elsewhere in your script or notebook. GammaGammaModel is the newer version of this model. All models suffixed with Fitter will be removed after BTYD is out of beta.
I do see a typo to fix in the import statement though, so thanks for bringing this to my attention.
— Reply to this email directly, view it on GitHub https://github.com/ColtAllen/btyd/issues/74#issuecomment-1308785388, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHKM54MXBOZUKINMYCSCAYLWHOTA7ANCNFSM6AAAAAAR3FKXSE . You are receiving this because you were mentioned.Message ID: @.***>
What library version are you using? It was added in 0.1b2, and I just tried importing it in 0.1b3 and it worked fine.
I upgraded and now I tried to use 0.1b3
but it threw the below error
AttributeError: 'Series' object has no attribute 'columns'
I pass the same columns as input (which we did for GammaGammafitter()). Meaning, my code looks like below
ggf = GammaGammaModel() # model object updated to GGmodel() instead of GGfitter()
ggf.fit(monetary_cal_df['frequency_cal'],monetary_cal_df['avg_monetary_value_cal']) # model fitting
Full error message looks like as shown below
> ggf = GammaGammaModel() # model object updated to GGmodel() instead of GGfitter()
> ggf.fit(monetary_cal_df['frequency_cal'],monetary_cal_df['avg_monetary_value_cal']) # model fitting
> # Prediction of expected amount of average profit
> monetary_cal_df["expct_avg_spend"] = ggf.conditional_expected_average_profit(monetary_cal_df['frequency_cal'], monetary_cal_df['avg_monetary_value_cal'])
> ```
>
> AttributeError Traceback (most recent call last)
> Input In [64], in <cell line: 2>()
> 1 ggf = GammaGammaModel() # model object updated to GGmodel() instead of GGfitter()
> ----> 2 ggf.fit(monetary_cal_df['frequency_cal'],monetary_cal_df['avg_monetary_value_cal']) # model fitting
> 3 # Prediction of expected amount of average profit
> 4 monetary_cal_df["expct_avg_spend"] = ggf.conditional_expected_average_profit(monetary_cal_df['frequency_cal'], monetary_cal_df['avg_monetary_value_cal'])
>
> File ~\Anaconda3\lib\site-packages\btyd\models\__init__.py:89, in BaseModel.fit(self, rfm_df, tune, draws)
> 63 def fit(self, rfm_df: pd.DataFrame, tune: int = 1200, draws: int = 1200) -> SELF:
> 64 """
> 65 Fit a custom pymc model with parameter prior definitions to observed RFM data.
> 66
> (...)
> 80
> 81 """
> 83 (
> 84 self._frequency,
> 85 self._recency,
> 86 self._T,
> 87 self._monetary_value,
> 88 _,
> ---> 89 ) = self._dataframe_parser(rfm_df)
> 91 self._check_inputs(
> 92 self._frequency, self._recency, self._T, self._monetary_value
> 93 )
> 95 with self._model():
>
> File ~\Anaconda3\lib\site-packages\btyd\models\__init__.py:214, in BaseModel._dataframe_parser(self, rfm_df)
> 200 def _dataframe_parser(self, rfm_df: pd.DataFrame) -> Tuple[np.ndarray]:
> 201 """
> 202 Parse input dataframe into separate RFM components. This is an internal method and not intended to be called directly.
> 203
> (...)
> 211 Tuple containing numpy arrays for Recency, Frequency, Monetary Value, T, and Customer ID (if provided).
> 212 """
> --> 214 rfm_df.columns = rfm_df.columns.str.upper()
> 216 # The load_cdnow_summary_with_monetary_value() function needs an ID column for testing.
> 217 if "ID" not in rfm_df.columns:
>
> File ~\Anaconda3\lib\site-packages\pandas\core\generic.py:5575, in NDFrame.__getattr__(self, name)
> 5568 if (
> 5569 name not in self._internal_names_set
> 5570 and name not in self._metadata
> 5571 and name not in self._accessors
> 5572 and self._info_axis._can_hold_identifiers_and_holds_name(name)
> 5573 ):
> 5574 return self[name]
> -> 5575 return object.__getattribute__(self, name)
>
> AttributeError: 'Series' object has no attribute 'columns'
That's because GammaGammaModel
has a streamlined API; the entire summary DF (of repeat customers) is passed in as a single argument rather than individual arrays. Try this instead:
ggf = GammaGammaModel()
# rename columns to `frequency` and `monetary_value` before fitting model
ggm.fit(monetary_cal_df)
exp_avg_spend = ggm.predict('avg_value')
clv = ggm.predict('clv',
transaction_prediction_model = bgm, # this is a trained BetaGeoModel(). Fitter models cannot be used
time = 12,
discount_rate = 0.01,
freq = "D",
)
Please the new Model
objects take a considerably longer time to train, but they are less likely to overfit and also provide entire probability distributions for model interpretation and predictions:
Below code cells may require editing to run properly
import arviz as az
# Fit proposed new BetaGeo Bayesian model
bgm= BetaGeoModel().fit(rfm_df)
# Use ArviZ to plot posterior parameter distributions against the MLE estimates
axes = az.plot_trace(
data=bgm._idata,
var_names=["BetaGeoModel::a", "BetaGeoModel::b", "BetaGeoModel::alpha", "BetaGeoModel::r"],
compact=True,
backend_kwargs={
"figsize": (12, 9),
"layout": "constrained"
},
)
fig = axes[0][0].get_figure()
fig.subtitle("BG/NBD Model Trace")
# Infer p_alive distributions for each customer:
p_alive_full = bgm.predict('cond_prob_alive', sample_posterior=True)
p_alive = bgm.predict('cond_prob_alive'
# Plotting function to compare results
def plot_conditional_probability_alive(p_alive_full, p_alive, idx, ax):
sns.kdeplot(x=p_alive_full[idx], color="C0", fill=True, ax=ax)
ax.axvline(x=p_alive[idx], color="C1", linestyle="--")
ax.set(title=f"idx={idx}")
return ax
fig, axes = plt.subplots(
nrows=3,
ncols=3,
figsize=(9, 9),
layout="constrained"
)
for idx, ax in enumerate(axes.flatten()):
plot_conditional_probability_alive(p_alive_full , p_alive, idx, ax)
fig.subtitle("Conditional Probability Alive", fontsize=16)
More information is provided in these PR writeups: https://github.com/ColtAllen/btyd/pull/24, https://github.com/ColtAllen/btyd/pull/33
@ColtAllen - What is avg_value
in ggm.predict()? from where do we get that column? You mean the monetary_value
?
And what is rfm_df
in BetaGeoModel().fit(rfm_df)
? From where do you get this rfm_df
.
Sorry, I couldn't find them in documentation. Apologies if I missed it.
#avg value is not column. It's a string identifier for the predictive method
conditional_expected_average_profit = ggm.predict(method = 'avg_value`)
rfm_df = summary_data_from_transaction_data(*args)
This conversation has also inspired me to refactor calibration_and_holdout_data
so that it outputs separate calibration and holdout dataframes, because having to rename the columns in order to use it with any of the new models is cumbersome. While I'm at it I'll also rename it from calibration/holdout to train/test, because most people are more familiar with the latter convention.
Also, my bad -GammaGammaModel
and ModBetaGeoModel
aren't showing up in the API Reference. I'll get that updated ASAP.
Changes have been made to documentation. If there's nothing else, I'm gonna close this issue.
Apologies for the delay. As I have been traveling, couldn't attend to this earlier.
I see in the documentation the GGM model is mentioned like as below but what is available (auto populates upon tab key) is
GammaGammaFitter
and notGammaGammaModel
as shown in doc below. I guess it should be updated else the import statement doesn't work.