Open choikwun opened 6 years ago
What is the propose of this : "determines which variables are correlated to each other" before performing PCA?
Could you help me understand better, how doing the PCA would help/contribute towards doing linear regression from the PCA output.
While PCA reduces high dimensional data ,it provides linear related output variables that can be used in regression .Modified PCA predictors would have better model performance metrics compared to the highly dimensional raw predictors.
@omidkj specifically picking only the variables which are correlated would make it easier to explain PCA.
As @EHWUSF said today, I have still to decide on whether to scale first and then assess correlation or scale after assessing correlation. I believe that I should scale first and then assess correlation.
High dimensional data makes model training slow and it is hard to visualize and conceptualize. Using PCA, we can reduce the number of dimensions and make the data more manageable. I propose a module which takes in a cleaned np array, determines which variables are correlated to each other, performs scaling on the correlated variables and performs PCA on the correlated variables. The output would an np.array.