Open Rachine opened 4 years ago
Oops, after some thinking maybe I should look at the goodness of fit with the multiple variable and not only individual correlations, to test
I added the R^2 when I do a Ordinary Least Squares with stats model 'y ~ z0 + z1 + z2'
Hello, Thank you very much for tackling this issue of confounders, which seems very recurrent in clinical ML problems.
I have some questions about the project/paper:
and
a Deconfounded test set (with no data leakage of course)?k
multiple confounders I still used most of your codebase and I used a pseudo generalization of the mutual information of multiple variables. The probability to be sampled m_i which wasis now:
The quantity can still be estimated with kernel density estimation.
I made some quick toy examples, it seems to approximately work on simple additive toy examples and when the number of example is sufficient: For instance with 1000 sample and 10 confounding factors i got: For instance with 100 sample and 3 confounding factors i got:
It would be also interesting to study the required
N
to be sure at a certain level the deconfounding capability fork
factors considering the type of link.Do you think this is a correct approach and generalization?
Thank you
Best regards