Closed TobyDavids closed 1 year ago
There are no duplicates or missing values within all three datasets.
Fulfillment Dataset
Warehouse Order Fulfillment (days)
column. Inventory Dataset
Year Month
and Warehouse Inventory
columns.Orders and Shipments
Order Time
column is a string instead of time. ----Not sure if it needs transformation.Discount %
column is an object instead of float.Create a copy of the dataset to preserve the original data.
I used the strip method to remove any extra spaces in the column names. Here is the sample code that was used.
inventory.columns = inventory.columns.str.strip( )
Change datatype of Discount from object to float
I began by displaying the unique values in the discount column.
orders_ship['Discount %'].unique( )
I realized that for the orders that received no discount, their values were represented as '-' instead of 0. This is the reason why the column is recorded as text instead of a float. To correct this, we will replace all the '-' values with 0 so that we can convert to float using astype( ) function.
orders_ship['Discount %'] = orders_ship['Discount %'].replace(' - ',0.0).astype(float)
Thank you Irene. From my end, I made some cleaning and Transformations steps.
I also normalized the data by creating 2 extra dimension tables (Customers & products) to reduce redundancy in the facts table (Orders).
The dates were merged as well.
Can we meet by 7pm (thats 9pm) on your end so we can look at the cleaned dataset and plan the next line of action. ? Thank you once again.
That's great work. Looking forward to the session.
@Irene-arch Cleans, and Transforms Dataset for analysis Deadline - Tuesday, 29th August.