Minyus / causallift

CausalLift: Python package for causality-based Uplift Modeling in real-world business
https://causallift.readthedocs.io/
Other
338 stars 42 forks source link

Some clarification about the code #13

Closed Jami1141 closed 4 years ago

Jami1141 commented 4 years ago

Hi,

I would like to ask you some questions about your codes (Causallift).

My first question is that: what is the advantages of using Causallift? I mean If we have a binary treatment then I use a XGBoost estimator for each treatment and then separately model them (untreated and treated models), then I calculate the propabilities of converted for each individuals for each model. Now If new student comes in I put its features to untreated model and gets probability of converted and put it to treated model and gets its converted probability. Then If the difference between probabilites of two models is positive, then I recommend this person to be treated and If the differences is negative then I don't recommend it to be treated.

Now in Causallift, we separate treated and untreated to two separate models. Our goal is to calculate the uplift ---> P(buy|treated) - P(buy|untreated).....But how we calculate uplift for one person that can not be treated and untreated in the same time? And why we say Causal to this model , please explain the difference with first approach.

Thanks in advance.

Minyus commented 4 years ago

I wrote the advantages of CausalLift at:

https://github.com/Minyus/causallift#what-are-the-advantages-of-causallift-package https://github.com/Minyus/causallift#why-causallift-was-developed

It is true that it is impossible to calculate uplift score for one person that can not be treated and untreated at the same time. We can only estimate the uplift score using the features .

The basic concept of supervised learning is that outcome of samples (e.g. customers, patients, etc.) with similar features will be similar.

Uplift Modeling, which works on top of supervised learning, uses the same concept.

Positive uplift score (CATE) means it is estimated that treatment causes outcome. That's why CausalLift is "Causal".

Jami1141 commented 4 years ago

Thanks for your explanation. I have following questions as well. You mentioned that you estimate uplift using features. You mentioned that: "The basic concept of supervised learning is that outcome of samples (e.g. customers, patients, etc.) with similar features will be similar." I agree with this. then how it works for Causallift? For example we want to know how students with treatment 1 convert better that if they do not treated at all. Then how we calculate uplift in this case? If student 1 in treatment 1 has a converted probability of 0.3 then how we calculate the uplift for this person using similar features?

After that I also want to know how you evaluate the results with simulation and later you calculate improved predicted conversion rate? what do you simulate and why?

It will be great to explain step by step from beginning that we want to estimate uplift.

Thanks in advance,

Minyus commented 4 years ago

For example we want to know how students with treatment 1 convert better that if they do not treated at all. Then how we calculate uplift in this case? If student 1 in treatment 1 has a converted probability of 0.3 then how we calculate the uplift for this person using similar features?

Short answer: Use a supervised model trained using samples with actual Treatment = 0.

Here are the full steps.

  1. Train a supervised model (e.g. XGBoost) using only samples with actual Treatment = 0
  2. Train a supervised model (e.g. XGBoost) using only samples with actual Treatment = 1
  3. Predict P(buy|untreated): the conversion rate (converted probability) setting the Treatment = 0 for all the samples regardless of the actual Treatment
  4. Predict P(buy|treated): the conversion rate (converted probability) setting the Treatment = 1 for all the samples regardless of the actual Treatment
  5. Calculate the estimated uplift score = P(buy|treated) - P(buy|untreated)

Please note that the actual outcome and treatment are not used to estimate the uplift score after training the 2 supervised models, and used only to evaluate the estimated uplift score later. This way, uplift score can be estimated for new data without knowing future treatment and outcome values.

After that I also want to know how you evaluate the results with simulation and later you calculate improved predicted conversion rate? what do you simulate and why?

Optionally, CausalLift can simulate the outcome by setting Treatment = 1 for the samples with high estimated uplift scores and setting Treatment = 0 for the samples with low estimated uplift scores regardless of the actual Treatment. This way, you can estimate the effect of deciding whether to treat for each sample based on the estimated uplift score and, hopefully, convince your business stakeholders, who are often not familiar with causal inference, to use the estimated uplift score.

Jami1141 commented 4 years ago

Thank you for explanation. You mentioned that: "Here are the full steps. Train a supervised model (e.g. XGBoost) using only samples with actual Treatment = 0 Train a supervised model (e.g. XGBoost) using only samples with actual Treatment = 1 Predict P(buy|untreated): the conversion rate (converted probability) setting the Treatment = 0 for all the samples regardless of the actual Treatment Predict P(buy|treated): the conversion rate (converted probability) setting the Treatment = 1 for all the samples regardless of the actual Treatment Calculate the estimated uplift score = P(buy|treated) - P(buy|untreated)

Please note that the actual conversion rate values (P(buy|untreated) and P(buy|treated)) are not    used to estimate the uplift score, and used only to evaluate the estimated uplift score later. This way, uplift score can be estimated for new data without knowing future treatment and outcome values."

I understood that we predict P(buy|untreated) and P(buy|treated) using two models for all samples regardless of their actual treatment and then calculate the difference between these probabilities for each individual. This is uplift for each person. Then how we look for persuadable ones? Is it possible to have as outcome which students are persuadable? Then the actual value of probabilities only used later for evaluation. That is correct?

Minyus commented 4 years ago

My apologies. There was a mistake in my previous post. Here is the corrected one:

Please note that the actual outcome and treatment are not used to estimate the uplift score after training the 2 supervised models, and used only to evaluate the estimated uplift score later. This way, uplift score can be estimated for new data without knowing future treatment and outcome values.

Samples with positive estimated uplift scores are the persuadable ones.

Persuadable = estimated uplift score > 0
Jami1141 commented 4 years ago

Ok thanks. Now following your explanation, I have another question: you mentioned that: "Optionally, CausalLift can simulate the outcome by setting Treatment = 1 for the samples with high estimated uplift scores and setting Treatment = 0 for the samples with low estimated uplift scores regardless of the actual Treatment."

high estimated uplift means uplift>0 and low estimated uplift means uplift<0 for using for simulation?

Another question related to observed_conversion rate and predicted one in treatment and untreated models. observed_conversion rate are average of converted rate? right?

The same for when we have simulation: observed conversion rate is the average of converted cases?

I have attached 3 files. in last file , in treated sample with and without uplift models, observed conversion rate without uplift modeling is the actual converted rate right? And then we have predicted improvement rate for each treated an untreated ones. what predicted improvement means? how should be this value to show the uplift scores are good to be used? and how should be close this value for train and test set in treatment or untreated? (0, 1 in your code)

What is the goal to obtain predicted improvement rate in each treatment and untreated and for all cases in "Show the estimated effect of recommendation based on the uplift model"? If we have predicted improvement of about 1.2 for example for treated in its test sample and predicted improvement rate of 0.7 for untreated in its test sample and a predicted improvement rate of 1.1 in test sample of whole sample, this what means? which value in more important to check If treatment was effective and uplifts are effective?

Thanks in advance.

1 2 3

And can you just explain following question?: I want to know what is the bridge between: when we separately model treated and untreated samples using XGBoost and if new student comes in then we take the features of new student and put in model (treated) and also in model (untreated) and then calculate the converted probabilities of each model. Then we subtract converted probability of treated model from converted probability of untreated model. If difference is positive means that we should treat the new student and if it is negative means we should not treat the student.

And between Causallift: when we separately model treated and untreated samples and then setting treatment to 0 we put all sample to untreated model and setting treatment to 1 we put all sample to treated model. Then we calculate the uplift. If uplift is positive then student should be treated and if it is negative should not be treated.

I want to know what is the difference between these two approaches? do yo think which is more accurate?

Another question:

you mentioned that:

"Predict P(buy|untreated): the conversion rate (converted probability) setting the Treatment = 0 for all the samples regardless of the actual Treatment Predict P(buy|treated): the conversion rate (converted probability) setting the Treatment = 1 for all the samples regardless of the actual Treatment"

I have two questions here:

-if we have new student and we want to calculate its uplift, then here also model set treatment to 0 and put it to untreated model and set to 1 and put it to untreated model? since for new student we do not have treatment feature and we only add a treatment feature and give random random to it. -For new student we only use feature and put it one time to treated and untreated and then calculate the differences? the same when we separately model treated and untreated samples and then for new student we put it to model treated and model untreated and get the difference between converted probabilities right? so in this case Causalift for prediction does exactly the same as two separated models when we use XGBoost without considering Causalift is it right or there is other thing that I don't consider? I want to understand the difference when we use two models for treated and untreated using XGBoost for both and Causallift for prediction. -Because I already use these two separated ones and I want to know the explanation why I use Causallift and why not my two models for predictions. -If we set all samples to treatment = 0 or set all samples to treatment = 1 to obtain P(buy|untreated) and P(buy|treated) respectively, then is there any differences? I mean yes there is difference in probabilities of two models of course, but when we put all treatment to 0 or 1 for XGBoost does not make difference since it is based on decision tree no? I mean if all values of treatment feature or any other features be all the same (either all 0 or all 1) it means this feature is kind of useless no?

Minyus commented 4 years ago

high estimated uplift means uplift>0 and low estimated uplift means uplift<0 for using for simulation?

No. It depends on treatment_fraction_train and treatment_fraction_train parameters, the fraction (portion) of treatment for train and test datasets, respectively. For example, if you set parameters as follows, CausalLift sets Treatment = 1 for the samples with the highest 20% of estimated uplift scores to simulate for each of train and test datasets.

cl.estimate_recommendation_impact(treatment_fraction_train=0.2, treatment_fraction_test=0.2)

I designed this way because treatment, such as advertisement and giving medicines, is often expensive and there is budget for treatment in real business. If the budget is limited to treatment for 20% of samples, for example, you can specify as above.

If you do not specify these parameters, the treatment fraction in train and test datasets are used.

observed_conversion rate are average of converted rate? right?

Yes.

observed conversion rate is the average of converted cases?

Yes.

observed conversion rate without uplift modeling is the actual converted rate right?

Yes.

what predicted improvement means?

Effect of following the uplift scores.

predicted improvement rate = predicted conversion rate using uplift model / observed conversion rate without uplift model

predicted improvement rate larger than 1 means good. Should be close if train dataset and test dataset are similar and the supervised learning models (e.g. XGBoost) are not overfit.

What is the goal to obtain predicted improvement rate in each treatment and untreated and for all cases?

Overall summary is the most important. Predicted improvement rate in each of treatment and untreated might be helpful for further research.

what is the difference between these two approaches?

I don't see any difference. The estimated uplift score will be the same.

if we have new student and we want to calculate its uplift

Use the model trained using untrained samples (Treatment = 0) and the model trained using treated samples (Treatment = 1)

Causalift for prediction does exactly the same as two separated models

Yes, CausalLift trains 2 supervised learning models (e.g. XGBoost) and uses them to predict the estimated uplift score. It seems you implemented uplift modeling as I did. The difference is that my code is provided as a Python package including evaluation of uplift scores by simulation.

if all values of treatment feature or any other features be all the same (either all 0 or all 1) it means this feature is kind of useless no?

A feature of a single value (e.g. all 0 or all 1) is useless. Treatment is not used as a feature to train each of the 2 supervised learning models (e.g. XGBoost)

Jami1141 commented 4 years ago

You mentioned: "For example, if you set parameters as follows, CausalLLift sets Treatment = 1 for the samples with the highest 20% of estimated uplift scores to simulate for each of train and test datasets. cl.estimate_recommendation_impact(treatment_fraction_train=0.2, treatment_fraction_test=0.2) I designed this way because treatment, such as advertisement and giving medicines, is often expensive and there is budget for treatment in real business. If the budget is limited to treatment for 20% of samples, for example, you can specify as above. If you do not specify these parameters, the treatment fraction in train and test datasets are used."

Ok suppose we don't set any fraction (it means it takes 0.2 for treatment fraction in train and test dataset) then in simulation for top 20% of all sample uplift we set treatment to 1 and for 80% we set treatment to 0?

Minyus commented 4 years ago

No, the treatment fraction in train and test datasets are used.

For example, if there are 15% of treated samples in train dataset and 14% of treated samples in test dataset, treatment_fraction_train and treatment_fraction_test will be set to 0.15 and 0.14, respectively, if not specified.

Jami1141 commented 4 years ago

I will repeat what you explained as I understood: You mean that when we want to do simulation, we reuse two models (treated and untreated) but now we decide for higher values of uplift scores setting treatment to 1 and for lower values setting treatment to 0. High values for uplift means:

If we don't set any value in "cl.estimate_recommendation_impact", then by default it takes uplifts with highest treatment fraction in train and in test sets and for them setting treatment to1. But when we set treatment to 1 for top treatment fraction in train and test set, the rest of treatments will be 0 and belongs to untreated sample right?

Minyus commented 4 years ago

Yes, treatment is set to 0 if not set to 1 by cl.estimate_recommendation_impact. If treatment_fraction_train = 0.20, for examples, 20% of samples with high uplift scores use the model for treated (Treatment = 1) and the rest (80%) of samples with low uplift scores use the model for untreated (Treatment = 0) for prediction.

Jami1141 commented 4 years ago

Ok thank you so much for replying all the questions in details.

Jami1141 commented 4 years ago

Dear Mr Minami,

I have a question about "Predicted improvement rate". You said that this value should be greater than 1 to be able to conclude that there is uplift using Treatment. But is there any range of values to say how this value should change? I mean for example if Predicted improvement rate= 1.3 with when Predicted improvement rate=2.5. what is the difference? Both of them are greater than 1. But the second one has higher value. How much higher will be good?

Thanks in advance, Narges

On Wed, Apr 1, 2020, 16:26 Yusuke Minami notifications@github.com wrote:

I wrote the advantages of CausalLift at:

https://github.com/Minyus/causallift#what-are-the-advantages-of-causallift-package https://github.com/Minyus/causallift#why-causallift-was-developed

It is true that it is impossible to calculate uplift score for one person that can not be treated and untreated at the same time. We can only estimate the uplift score using the features .

The basic concept of supervised learning is that outcome of samples (e.g. customers, patients, etc.) with similar features will be similar.

Uplift Modeling, which works on top of supervised learning, uses the same concept.

Positive uplift score (CATE) means it is estimated that treatment causes outcome. That's why CausalLift is "Causal".

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Minyus/causallift/issues/13#issuecomment-607281922, or unsubscribe https://github.com/notifications/unsubscribe-auth/AO3KM6A2VMLUPHHJ2B74LSTRKNFKRANCNFSM4LYXOW5A .

Minyus commented 4 years ago

The higher the better. There is no range.