fiza-aslam / fiza-aslam.github.io

Bellabeat case study
0 stars 0 forks source link

Bellabeat Case Study

Author: Fiza Aslam

Date: 18/02/2023

bellabeatlogo big

This case study follows from the capstone project from the Google Data Analytics Course. I will be using the 6 step data analysis process for this case study: Ask, Prepare, Process, Analyze, Share and Act.

Background

Bellabeat is a high tech manufacturer focused on health products for women. As of now, Bellabeat is a small company which would like to to grow in the smart device market. The CEO believes that analyzing smart device data will allow the company to exploit opportunities to grow in a market where technology is always evolving. Therefore, our team has been asked to analyze smart device data to see how consumers are using their smart devices, and to provide high level reconmendations for Bellabeats marketing strategy.

1. Ask

Business Task: Analyze smart device usage data to see how consumers use non-Bellabeat products. Finalize insights into a report of analysis conducted and presentation on how Bellabeat should proceed with their products and marketing strategies.

Key stakeholders: women who use health products, Urska (CEO of Bellabeat), Sando (cofounder), executive team, Bellabeat marketing team, team of data analysts.

2. Prepare

Data source: https://www.kaggle.com/datasets/arashnic/fitbit CC0: Public Domain, dataset made available through Mobius.

This dataset was generated by respondents to a distributed survey via Amazon Mechanical Turk between 12/03/2016-12/05/2016. Thirty eligible Fitbit users consented to the submission of personal tracker data. It includes 18 sets of data including types of daily activity. However, only 6 datasets are used: daily activity, weight log, sleep, hourly steps, hourly intensities and hourly calories. Rest of the datasets are irrelevant for this analysis.

Are there any issues with bias or credibitility of the data source - does it ROCCC?

Dataset is under CC0: public Domain lisecnce which means creator has copyright law.

3. Process

Tools that will be used: Excel for data cleaning and Rstudio for analysis and creating visualizations. To ensure data integrity, limitations of datasets will be accounted for by using excel to clean the data. The cleaned dataset overcomes the limitations and hence contributes to the overall accuracy, completeness and consistency of the data.

Data cleaning process for each dataset:

Datamapping: in the analysis stage, we need to ensure datasets are compatible so they datasets can be merged for analysis. By using text to column function, we ensure headings and cell format were identical to all other datasets (consistent naming conventions). Also, by formatting all date columns in each dataset to ‘date’ ensures compatibility.

4. Analyze

Cleaned dataset from Excel has been imported and now ready for analysis. The visualizations will be included in this stage instead of the share stage as visualizations will be created through Rstudio.

Creating dataframes.

daily_activity <- read.csv("dailyActivity_merged.csv")
sleep_day <- read.csv("sleepDay_merged.csv")
hourly_calories <- read.csv("hourlyCalories_merged.csv")
hourly_intensities2 <- read.csv("hourlyIntensities_merged.csv")
hourly_steps <- read.csv("hourlySteps_merged.csv")
weight_log <- read.csv("weightLogInfo_merged.csv")

Merging daily activity dataset with sleep dataset.

daily_activity_with_sleep <- merge(sleep_day, daily_activity, by="Id")
daily_activity_with_weight <- merge(weight_log, daily_activity, by="Id")

Creating summary statistics.

daily_activity %>%  
  select(TotalSteps,
         TotalDistance,
         SedentaryMinutes) %>%
  summary()

sleep_day %>%  
  select(TotalSleepRecords,
         TotalMinutesAsleep,
         TotalTimeInBed) %>%
  summary()

hourly_calories %>%  
  select(ActivityHour, Calories) %>%
  summary()

hourly_intensities2 %>%  
  select(ActivityHour, TotalIntensity, AverageIntensity) %>%
  summary()

hourly_steps %>%  
  select(ActivityHour, StepTotal) %>%
  summary()

weight_log %>%  
  select(WeightKg, WeightPounds, Fat, BMI, IsManualReport) %>%
  summary()

summary statistics2

Converting activity date to weekdays.

daily_activity <- daily_activity %>% mutate( Weekday = weekdays(as.Date(ActivityDate, "%m/%d/%Y")))
daily_activity$weekday1 <- ordered(daily_activity$Weekday, levels=c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday"))
activity_data <- daily_activity %>% 
  group_by(Weekday) %>% 
  summarize(count = n())  

Plotting weekday activity_data to see which days individuals use tracker the most.

ggplot(activity_data, aes(x=Weekday, y=count)) +
  geom_bar(stat="identity",color="black",fill="purple") +
  labs(title="Daily activity usage", x="Days", y="Count") 

countdays2

Plotting relationship between steps taken and sedentary minutes.

ggplot(data=daily_activity, aes(x=TotalSteps, y=SedentaryMinutes)) + geom_point(colour="purple", size=0.5)+geom_smooth(method="lm")+labs(title="Relationship between total steps and sedentary minutes", x="Total steps", y="Sedentary minutes")

sedenminbig

Plotting relationship between total steps taken and calories burnt.

ggplot(data=daily_activity, aes(x=TotalSteps, y=Calories))+geom_point(color="purple", size=0.5)+ labs(title="Relationship between total steps and calories burnt", x= "Total Steps", y = "Calories")+ geom_smooth(method = 'loess', formula= 'y ~ x')

stepscal

Plotting relationship between total minutes asleep and total time in bed.

ggplot(data=sleep_day, aes(x=TotalMinutesAsleep, y=TotalTimeInBed)) +
  geom_point(colour="purple", size=0.5) +
  labs(title="Relationship between time asleep and time in bed",x="Time Asleep",y="Time in Bed") +
  geom_smooth(method="lm") 

sleepbed2

Finding the absolute number of people getting their recommended number of hours of sleep (7-9hours, 420-540min).

daily_activity_with_sleep %>%
  count(TotalMinutesAsleep < 420)
daily_activity_with_sleep %>%
  count(TotalMinutesAsleep > 540)

From absolute values, proportion of individuals are taken and piechart is constructed on excel.

less than 420 greater than 520

piechart6

Plotting hourly calories burnt.

ggplot(data=hourly_calories, aes(x=time, y= Calories)) + 
  geom_histogram(stat = "identity", fill="purple") +
  theme(axis.text.x = element_text(angle = 90)) +
  labs(title="Average calories burnt by the hour", x="Time", y="Calories")

calhours

5. Share

Findings from analysis stage will be shared and this is supported with the visualizations created above.

Key Findings

6. Act

Findings will be acted on by creating a presentation to stakeholders summarizing the analysis. From these findings, high-level recommendations will be made on how Bellabeat should go forward in improving their marketing strategy.

Recommendations

Importantly, for the above recommendations to be made, Bellabeat should invest a large sum into their technology of data science in developing their algorithms. As technologic advancements are accelearting in the economy today, by investing in their technology to improve their services, Bellabeat will stay ahead of competitors. Therefore, they should also promote their product by use of social media.