Irene-arch / Supply-Chain-Analytics-Python_PowerBI

This project is part of a competition that was held by DataCamp.
0 stars 0 forks source link

Data Cleaning, Transformation #2

Closed TobyDavids closed 1 year ago

TobyDavids commented 1 year ago

@Irene-arch Cleans, and Transforms Dataset for analysis Deadline - Tuesday, 29th August.

Irene-arch commented 1 year ago

There are no duplicates or missing values within all three datasets.

Dataset issues

Fulfillment Dataset

Inventory Dataset

Orders and Shipments

Fixing dataset issues

Create a copy of the dataset to preserve the original data.

I used the strip method to remove any extra spaces in the column names. Here is the sample code that was used.

inventory.columns = inventory.columns.str.strip( )

Change datatype of Discount from object to float

I began by displaying the unique values in the discount column.

orders_ship['Discount %'].unique( )

I realized that for the orders that received no discount, their values were represented as '-' instead of 0. This is the reason why the column is recorded as text instead of a float. To correct this, we will replace all the '-' values with 0 so that we can convert to float using astype( ) function.

orders_ship['Discount %'] = orders_ship['Discount %'].replace('  -  ',0.0).astype(float)
TobyDavids commented 1 year ago

Thank you Irene. From my end, I made some cleaning and Transformations steps.

I also normalized the data by creating 2 extra dimension tables (Customers & products) to reduce redundancy in the facts table (Orders).

The dates were merged as well.

Can we meet by 7pm (thats 9pm) on your end so we can look at the cleaned dataset and plan the next line of action. ? Thank you once again.

Irene-arch commented 1 year ago

That's great work. Looking forward to the session.