Question about training

srn284 commented 4 years ago

Dear Victor, Again super cool work and it's awesome that your work online is so well documented and put together.

I have extended this model for multiple treatments T={t1,t2,t3..} (i.e. multinomial propensity prediction and binary outcome with |T| heads) and oversampling of minority treatment groups heavily changes causal estimates downstream than when I do not oversample minority classes.

I was wondering: Is there something theoretically incorrect about oversampling while looking at causality? or can I keep doing it because the results are more balanced?

Warm wishes and thank you for your time, Shishir

vveitch commented 4 years ago

Is your extension to allow for multiple possible contrasts or for multiple treatments applied over time? That is, are you considering treatments \in {0,1,2} and contrasts such as E[Y | do(T=2)] - E[Y | do(T=0)]? Or is it the case you'll apply multiple treatments to each unit?

In the former case, it might an issue of ATT vs ATE. Comping the former shouldn't be too sensitive to downsampling. The ATE might be though, because it'll change the population of people you're taking the expectation over.

Btw, we also ran into a use case where we needed to allow multiple treatments (and outcome missingness). The code for that is here: https://github.com/vveitch/causal-text-embeddings-tf2 You may find it helpful.

Best,

Victor

srn284 commented 4 years ago

Dear Victor, Sorry I might have misunderstood you but I think to answer your q: not over time, but after a static point in time, I have multiple potential outcomes. So in the original dragonnet scenario it's 2 treatments and 2 potential outcomes (so 2 cond outcome heads), and I'm doing 5 treatments and 5 potential outcomes (5 cond outcome heads). i believe it's what you're saying with multiple possible contrasts.

I thought about it after sharing as well; the data imbalance is a mess and some minority classes (treatments) have about 1% occurrence in the data. Anyway, I run the ATE on the entire test set - i just train with the minority oversampled batches. I'm thinking the latent space before propensity prediction is heavily biased by majority treatment/(s) if I do not do oversampling, but if I do, and I might be wrong here, but I feel it's a bit like a poor version of batch-wise, propensity matching. And so the causal estimates are quite stable - as in, low variance on tmle/naive and if i run 10 epochs/20 epochs of training the causal estimates don't move a lot.

And interesting, didn't know you had done a sim with multiple treatments! Thanks for sharing.

Warm wishes, Shishir

blei-lab / causal-text-embeddings

Question about training #6