Closed jgdpsingh closed 1 year ago
Hi, I'm sorry to hear you encountered some setbacks during your analysis. I'll need some more details to be able to be helpful. Do you have a sample of the data I could use to reproduce the problem (or can you synthesize a minimal example causing problems)? How was the propensity model defined? How do the evaluation plots look exactly?
On the face of it, there shouldn't be any constraints for using categorical variables. [Except if you have >=3 treatment levels, in which case most evaluations are not well-defined, but estimation should still be valid.]
As for the BRCG method, it is not part of causallib, so I can't speak for it. However, as far as I know, this method first binarizes continuous features, so having no-continuous features in the first place shouldn't be an obstacle.
I have attached the results and the sample data.. I used logistic regression classifier on the training set.. then identified the treatment variable and applied the same codes as in your biomed example code.
Thanks for your help!
Hey @jgdpsingh , these plots actually look fairly good! 🙂 What do you think is the problem here?
Actually in the example of Bank Marketing, it was suggested that the mean differences in the covariate balance love plot should be minimised. And in my categorical dataset, the mean differences seemed a bit large. So I applied BCRG method on the same lines as that in example to find out those rules to minimise those differences. But none displayed. So just wanted to see if the library actually works for survey datasets.
But going by your feedback, I guess it does work well enough for surveys. Will test it further. Thanks a lot!
I apologize for the confusion in the Bank Marketing example. I will revisit it to see if I can revise the wording to make it better.
For the sake of completeness, the Love plot shows the absolute difference in means for each covariate (and the inverse probability weighted mean too). Large ASMD values can hint that the treatment groups are different and their covariate distribution is imbalanced. Large unweighted AMSD values are expected in non-randomized settings, because individuals self-select into treatment groups. However, if the weighting process is successful, the weighted ASMD values should decrease and ideally have them all as closest to zero as possible.
Thanks again for bringing this up, and good luck!
I am trying to use the library for a survey dataset where no entry is numerical and all the responses are categorical in nature. On using Causal Inference 360's evaluation plots, the results were not very encouraging, i.e. wide chasms between weighted and unweighted variables in propensity plots.
Also, Boolean Rules via Column Generation (BRCG) method didn't return any rule. Presumably because no entry was numerical. The result was this
Initial LP solved Iteration: 1, Objective: 0.2203 Accuracy: 0.7797356828193832 AUC: 0.5 ['']
Can this library be used to find out causal relationships between categorical variables? If yes, can you share any notebook or example for the same?