Open PetalsOnWind opened 3 years ago
@PetalsOnWind I am interested in doing this ,I have previous experience too ,could you please assign it so that I can work on it
I am looking for a short introduction on why you would use dummy variables and why use (n-1) for n categories and how to interpret the coefficients. You can also branch off to dummy variables as dependent model and using logit/probit models
@PetalsOnWind we use (n-1) for n categories to avoid the dummy variable trap. Basically all the categorical data are multi-colinear(which means once we know for N-1 variables we can predict the N 'th variable) ,it can be thought of intuitively as of probability where we know sum of all n probabilities will be 1 so if we know the probabilities of n-1 variables then the nth probability can be taken out by [1-(n-1)th value].
so for all n categories we drop 1 to avoid redundancy and take n-1 categories
@saptarshimondal1305 You might try refining this and adding some proofs and codes in a .ipynb file and then you can send a PR.
@PetalsOnWind I have made a .ipynb file but how to upload it here . I can't send a PR . How to do it ???
I would like to work on this issue under GSSoc'21., I have some idea of dummy variables. Please assign it to me.
Thank You for assigning this issue to me. :). Will try to make PR as soon as possible.
@PetalsOnWind Do i have to make a readme for this?
@PetalsOnWind I'm interested to contribute my ideas on this topic and work on the same , but please give me a detailed explanation on the work that I need to do , as I'm new to open source .