Closed alexrods closed 2 years ago
Since this is a dataset that was downloaded straight from Kaggle, our data is very neat. We have no normalization to make, no One-hot encoding nor do we have to make any reductions to make. The only issue that was presented was with the geolocalization dataset; however we were able to solve it by using Tableau. Since there were thousands of data points, Python crashed on us, but Tableau managed to process this information with no sweat, and even though we had a couple of outliers, they don't impact the overall analysis.
I agree the data has not much processing to be done, but this is not due to it coming from Kaggle (an example of an uncleaned dataset is this one from a movie dataset)
Still I would say normalization is required depending on the analysis we would like to do, for example if we wanted to do a regression analysis on the price of a product, the normalization would be needed.
Summary
Set pre-processing data to clean and transform of data model
Acceptance Criteria