C10-Brazilian-e-commerce-modeling-team / brazilian-e-commerce

0 stars 6 forks source link

chore: Clean and transform the data to the structure required for the data model. #20

Closed alexrods closed 2 years ago

alexrods commented 2 years ago

Summary

Set pre-processing data to clean and transform of data model

Acceptance Criteria

GabyGO2108 commented 2 years ago

Since this is a dataset that was downloaded straight from Kaggle, our data is very neat. We have no normalization to make, no One-hot encoding nor do we have to make any reductions to make. The only issue that was presented was with the geolocalization dataset; however we were able to solve it by using Tableau. Since there were thousands of data points, Python crashed on us, but Tableau managed to process this information with no sweat, and even though we had a couple of outliers, they don't impact the overall analysis.

larispardo commented 2 years ago

I agree the data has not much processing to be done, but this is not due to it coming from Kaggle (an example of an uncleaned dataset is this one from a movie dataset)

Still I would say normalization is required depending on the analysis we would like to do, for example if we wanted to do a regression analysis on the price of a product, the normalization would be needed.