Data: data
We choose data sourced from the National Retail Federation (NRF) in the United States about consuming for Valentine's Day. For choosing this dataset, we are aiming for finding how consummers plan to celebrate Valentine's Day that may include total spending, average spending, types of gifts planned and spending per type of gift. Additionally, it provides demographic breakdowns by age group and gender. With this dataset, we can know the trend of choosing gifts in the world, suitable for all ages, from which we can choose suitable gifts to give to our beloved women. This dataset comprises 3 distinguish files contain the following detail:
Question: How does the spending on different gift categories (e.g., Candy, Flowers, Jewelry, Greeting Cards, Evening Out, Clothing, and Gift Cards) vary across different age groups and between genders?
In this section, we want to explore the relationship between age gender and various gift categories. To do so, we analyze on age groups for each gift category using data from the "gifts_age.csv" and "gifts_gender.csv" file.
Since we want to explore how male and female spend differently on a specific category, group bar chart is employed for visualization in purpose of comparison in age and gender. Besides, we also investiage some other aspects of data. To do so, we take advantage of pie chart (best for comparison percentage). Other charts are not appropriate, i.e. stacked bar chart because of two reasons:
Difficulty in Comparing Individual Segments: Stacked bar charts make it extremely difficult and irritating when we try to compare the size of individual segments across different bars, especially if the segments are not aligned at the same baseline. This is because the viewer's ability to accurately judge segment lengths is impaired when the segment starts at various different heights.
Complexity with Many Categories: When there are many segments within each bar, the chart can become cluttered and overwhelming, making it difficult to extract meaningful insights. This complexity can lead to misinterpretation of the data.
library(ggplot2)
library(tidyr)
library(readr) # Assuming this might be needed for read_csv if the default read.csv isn't used
# Make sure to correctly read the dataset into 'data'
data <- read.csv("./data/gifts_gender.csv") # Adjust path as needed
# Double-check the 'data' is correctly loaded
# head(data)
# Correctly apply pivot_longer to transform the data
data_long <- pivot_longer(data, cols = -Gender, names_to = "GiftType", values_to = "Percentage")
# Proceed with the rest of your plotting code
# Adjust the position_dodge width to increase space between the bars
# dodge <- position_dodge(width = 0.5) # Use 'width' for horizontal chart logic
# Create the horizontal bar plot
ggplot(data_long, aes(x = GiftType, y = Percentage, fill = Gender)) +
geom_bar(stat = "identity", position = "dodge2", width = 0.5) +
geom_text(aes(label = sprintf("%.1f%%", Percentage), y = Percentage + 2), position = dodge, hjust = 0.5, size = 3.0) +
theme_minimal() +
labs(title = "Average Percentage Spending on Gift Types by Gender",
y = "Gift Type",
x = "Average Percentage Spending",
fill = "Gender") +
theme(axis.text.y = element_text(angle = 0, hjust = 1))
The illustrated figure reveals that men allocate the majority of their expenditure (56%) towards purchasing flowers for celebrating Valentine's Day. In contrast, women predominantly spend their funds (59%) on purchasing candy. The most notable distinction is observed in the flower category, where men exhibit the highest expenditure.
# Load necessary libraries
library(ggplot2)
library(reshape2)
# Read the dataset
gifts_age <- read.csv("./data/gifts_age.csv")
# Reshape the data from wide to long format to facilitate plotting
library(tidyr)
gifts_long <- pivot_longer(gifts_age,
cols = Candy:GiftCards,
names_to = "GiftCategory",
values_to = "Percentage")
# Plot
ggplot(gifts_long, aes(x = Age, y = Percentage, fill = GiftCategory)) +
geom_bar(stat = "identity", position = "dodge") +
theme_minimal() +
scale_fill_manual(values = c("#FFBCB4", "#FFD64C", "#00BA38", "#55FF8A", "#00B9F6", "#B0E1DD", "#C77CFF"))+
labs(title = "Spending on Gift Categories Across Age Groups",
x = "Age Group",
y = "Percentage",
fill = "Gift Category") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
When considering the impact of age on Valentine's Day celebrations, a contradiction emerges between younger individuals and the elderly. Younger people are more inclined to purchase candy compared to greeting cards.
Question: How have the overall spending on celebrating, the per-person spending, and the spending on different gift categories (e.g., Candy, Flowers, Jewelry, Greeting Cards, Evening Out, Clothing, and Gift Cards) changed over the years, and how do these trends relate to economic factors or events?
First, we explore how many people celebrate valentine in period of 2010-2022. To do this, we employ column "peoplecelerating" from "historical_spending.csv" file.
The figure illustrates that a majority of individuals actively participate in Valentine's Day celebrations. To gain deeper insights into the prevailing trend during this occasion, we delve into the fluctuating number of participants over the years to discern any discernible patterns. This exploration is facilitated through the analysis of data sourced from "historical_spending.csv." through a line chart. Moreover, we use an indicator (covid-19 outbreak event) to explore the it's impact on people behavior.
library(ggplot2)
library(dplyr)
# Assuming 'data' has been read from "historical_spending.csv"
# Ensure this line is correctly loading your data
data <- read.csv("./data/historical_spending.csv")
# Create a new column for labels, spacing out every two years
meanCele = mean(data$PercentCelebrating)
meanNot = 100 - meanCele
newdata <- data.frame(
group=c("Celebrating", 'Not celebrating'),
value=c(meanCele, meanNot)
)
newdata <- newdata |>
arrange(desc(group)) |>
mutate(prop = value / sum(newdata$value) *100) |>
mutate(ypos = cumsum(prop)- 0.5*prop )
# Basic piechart
ggplot(newdata, aes(x="", y=prop, fill=group)) +
geom_bar(stat="identity", width=1, color="white") +
coord_polar("y", start=0) +
theme_void() +
theme(legend.position="none") +
geom_text(aes(y = ypos, label = group), color = "white", size=6) +
scale_fill_brewer(palette="Set1")
The depicted figure reveals a declining trend in the number of individuals participating in Valentine's Day celebrations. However, this downward trajectory was disrupted in 2020 due to the outbreak of COVID-19 and subsequent national lockdowns. Despite a temporary resurgence during the pandemic, the trend of declining participation has persisted even after the pandemic has been brought under control. For the data in the last collected years, we observer more than half (53 percent) of U.S. consumers plan to celebrate the holiday in 2022, up from 52 percent in 2021.
library(ggplot2)
library(emojifont)
library(emoGG)
# search_emoji("heart")
# Assuming 'data' has been read from "historical_spending.csv"
# Ensure this line is correctly loading your data
data <- read.csv("./data/historical_spending.csv")
# Create a new column for labels, spacing out every two years
data$Label <- ifelse(seq_along(data$Year) %% 2 == 0, paste0(data$PercentCelebrating, "%"), NA)
# Adjust the annotation position for the COVID-19 label
annotation_y_position <- max(data$PercentCelebrating, na.rm = TRUE) * 0.95 # Adjust vertically to avoid overlap
annotation_x_position <- 2020 - 2 # Move text to the left of the vertical line
spline_int <- as.data.frame(spline(data$Year, data$PercentCelebrating))
# Create a line plot with a COVID-19 pandemic indicator
ggplot(data, aes(x = Year, y = PercentCelebrating)) + geom_point()+
geom_emoji(emoji = "2764")+
geom_line(data = spline_int, aes(x = x, y = y),color="#c90970", size=1.0) + # Draw the line
geom_label(
aes(label = Label),
nudge_x = 0.45,
nudge_y = 0.45,
check_overlap = TRUE
)+
geom_vline(xintercept = 2020, linetype = "dashed", color = "black") + # Add a vertical line for the pandemic start
annotate("text", x = annotation_x_position, y = annotation_y_position, label = "COVID-19 Pandemic Start", vjust = -0.5, color = "black", angle = 0) + # Adjusted annotation
theme_minimal() + # Use a minimal theme
labs(title = "Percentage of People Celebrating Over the Years",
x = "Year",
y = "Percentage Celebrating") +
scale_x_continuous(breaks = data$Year) # Ensure all years are shown
Similarly, we utilize a line chart to illustrate the trend of expenditure on Valentine's Day. Utilizing the same dataset and event indicators, we observe a contradictory phenomenon. Spending on celebrations experienced an upward trajectory between 2010 and 2022. On average, consumers anticipate spending $185.81 each, representing an increase of nearly $8 compared to the average Valentine’s Day expenditure over the past five years. In contrast, only $103 was recorded as the average spending per person in 2010.
library(ggplot2)
# Assuming 'data' has been read from "historical_spending.csv"
# Ensure this line is correctly loading your data
data <- read.csv("./data/historical_spending.csv")
# Optional: Create a new column to indicate which points to label
data$Label <- ifelse(seq_along(data$Year) %% 2 == 0, as.character(data$PerPerson), NA) # Label every other point
# Adjust the annotation position based on the PerPerson spending range
annotation_y_position <- max(data$PerPerson) * 0.95 # Adjust vertically to avoid overlap
annotation_x_position <- 2020 - 2 # Move text to the left of the vertical line if needed
spline_int <- as.data.frame(spline(data$Year, data$PerPerson))
# Create a line plot focused on Per Person Spending over the years
ggplot(data, aes(x = Year, y = PerPerson)) +
geom_line(data = spline_int, aes(x = x, y = y),color="#c90970", size = 1.0) + # Draw the line
geom_point(color = "blue") + # Add points
geom_text(aes(label = Label), vjust = -1, check_overlap = TRUE) + # Add labels for spaced-out points
geom_vline(xintercept = 2020, linetype = "dashed", color = "black") + # Add a vertical line for the pandemic start
annotate("text", x = annotation_x_position, y = annotation_y_position, label = "COVID-19 Pandemic Start", vjust = -0.5, color = "black", angle = 0) + # Adjusted annotation
theme_minimal() + # Use a minimal theme
labs(title = "Per Person Spending Over the Years",
x = "Year",
y = "Per Person Spending (US Dollar)") +
scale_x_continuous(breaks = data$Year) # Ensure all years are shown
In this section, multiple line charts are employed to illustrate various trends. Notably, for the year 2022, it is projected that total spending on Jewelry and Evening Dates could increase by more than $45 and $31, respectively. After the pandemic, only gift card purchases experienced a surge in volume, while other categories exhibited a downward trend. The most significant decline was observed in dining out, nearly reaching the level of expenditure on clothing. However, the costliest gift category, jewelry, more than doubled from $21.52 to $45.57 over the recorded period, while spending on the remaining categories remained relatively unchanged, resulting in an overall uptrend in spending per person. We also obtain the price of gold in same period to get more insights. Moreover, we further compare the percentage people spending instead of price. A surprise observation is that jewelry not change the propotion much.
library(ggplot2)
library(tidyr)
library(dplyr) # For data manipulation
# Assuming your data is already loaded into 'data'
data <- read.csv("./data/historical_spending.csv")
# Transform data from wide to long format
data_long <- pivot_longer(data, cols = -Year, names_to = "Category", values_to = "Spending")
# Filter out 'PerPerson' and 'PercentCelebrating' categories
data_long_filtered <- data_long %>%
filter(!Category %in% c("PerPerson", "PercentCelebrating")) %>%
mutate(SpendingLabel = ifelse(Year %% 6 == 0, as.character(Spending), NA)) # Add labels for every 3rd year for clarity
# Custom color palette (adjust as needed for your categories)
my_colors <- c("Candy" = "darkred", "Flowers" = "darkgreen", "Jewelry" = "#0072B2", "GreetingCards" = "darkorange", "EveningOut" = "#5D3FD3", "Clothing" = "darkmagenta", "GiftCards" = "darkcyan")
# Create the line plot with customized points
ggplot(data_long_filtered, aes(x = Year, y = Spending, color = Category)) +
geom_line() +
geom_point(aes(shape = Category), size = 2, stroke = 2) + # Customized points with different shapes for categories
geom_text(aes(label = SpendingLabel), vjust = -1.5, check_overlap = TRUE) + # Add labels for spending every 3 years
geom_vline(xintercept = 2020, linetype = "dashed", color = "red", size = 1) + # Add a vertical line for the COVID-19 pandemic start
annotate("text", x = 2020, y = 23, label = "COVID-19 Pandemic Start", vjust = -1, color = "red", angle = 0, hjust = 1.1, size = 5) + # Annotate the line
theme_minimal() +
theme(
panel.grid.major = element_line(color = "grey80"), # Darker grid lines
panel.grid.minor = element_line(color = "grey80", size = 0.25) # Darker and finer minor grid lines
) +
labs(title = "Yearly Spending on Different Gift Categories",
x = "Year",
y = "Spending (US Dollar)",
color = "Category") +
scale_x_continuous(breaks = seq(min(data$Year), max(data$Year), by = 1)) + # Adjust the x-axis breaks if needed
scale_y_continuous(labels = scales::comma) + # Use comma for large numbers, remove if you prefer the log scale
scale_shape_manual(values = c(16, 17, 18, 19, 20, 21, 22)) + # Custom shapes for categories, adjust numbers as needed
scale_color_manual(values = my_colors) # Use custom colors
The analysis conducted reveals insightful trends in Valentine's Day celebrations and spending patterns over the years. Despite a general decline in the number of people participating in Valentine's Day festivities, a surge was observed in 2020 due to the COVID-19 pandemic, followed by a continuation of the downward trend post-pandemic. Conversely, expenditure on Valentine's Day activities exhibited a consistent upward trajectory between 2010 and 2022, with consumers anticipating an increase in spending per person, notably in categories such as Jewelry and Evening Dates. During the pandemic, while gift card purchases surged, other categories experienced a decline, particularly in dining out, which approached levels similar to clothing expenditure. Notably, jewelry emerged as the costliest gift category, more than doubling in average expenditure per person. Overall, the analysis indicates shifting trends in both participation and spending habits, influenced by external factors such as the pandemic.