aloctavodia / BAP3

Figures and code examples from Bayesian Analysis with Python (third edition)
http://bap.com.ar/
161 stars 45 forks source link

Estimating difference in ARPU (using knowledges of chapter 2) #31

Open angrydozer opened 4 months ago

angrydozer commented 4 months ago

People, hey!

I've already read first 2 chapters of the book (BAP3) – it's super useful.

After that I decided to experiment with models and my goal is to estimate difference between ARPU for different coutnries. I started with code below, but then was stucked because sampling countinues infinite amount of time. If someone can help me with an advice πŸ€• regarding this model, explain the problem, I will really appreciate it.

All data stored in pandas dataframe called data where rows = users. It contain following columns: player_id – unique user id country_code – US or JP revenue_7 – cumulative revenue to 7th day of user's life is_payer – 0 or 1 computed from revenue_7, depending on revenue amount (zero or more than zero)

Currently my model looks like that:

country = np.array(['US', 'JP'])
country_idx = pd.Categorical(data['country_code'], categories=country).codes
coords = {'country': country, 'country_flat': country[idx]}

# ChatGPT suggested me to use that for ignoring of zero values, it allowed me to use Gamma dist
revenue_observed = np.where(data['is_payer'] == 1, data['revenue_7'], np.nan)

with pm.Model(coords=coords) as model:

    p = pm.Beta('p', alpha=1, beta=1, dims='country')
    y = pm.Bernoulli('y', p=p[country_idx], observed=data['is_payer'])

    mu = pm.HalfNormal('mu', sigma=10, dims='country')
    sigma = pm.HalfNormal('sigma', sigma=15, dims='country')
    revenue = pm.Gamma('revenue', mu=mu[country_idx], sigma=sigma[country_idx], observed=revenue_observed)

    arpu = pm.Deterministic('arpu', p * mu, dims='country')

    idata = pm.sample()
    idata.extend(pm.sample_posterior_predictive(idata))
aloctavodia commented 4 months ago

Can you share the data? Or something that looks like your data?

Maybe instead of a gamma, you want to use a HurdleGamma? I don't mind you or others asking general modelling questions here, but if your questions are not directly book-related you can post it on PyMC's discord. You will get more people looking at your questions and potentially more answers there.