PCA is a method of extracting important variables in form of components from a large set of variables available in a dataset. It extracts low dimensional set of features from high dimensional data set to capture as much info possibles.
Started PCA-
-By Reading csv files - bike_train-set.csv & bike_test_data.csv
-Displayed column names and number of rows in both data sets.
looked up NA values which returned false.
-Factorized data sets with variables season, holiday, working day, weather as they are categorical.
-Took random 2000 rows in train-data and test-data.
-divided date time column to increase analysis for algorithm and see how the hour was acting significantly in terms of usage
-plotted hour vs count where it was determined that the hours 5pm-8pm were showing high user count.
As Reducing the dimension helps in better visualization as well storage , therefore to evaluate the co-relation i have implemented cor().
which concluded that temp and atemp are highly co-related.
PCA is a method of extracting important variables in form of components from a large set of variables available in a dataset. It extracts low dimensional set of features from high dimensional data set to capture as much info possibles.
Started PCA- -By Reading csv files - bike_train-set.csv & bike_test_data.csv -Displayed column names and number of rows in both data sets.