dangkh / DataVisu

0 stars 0 forks source link

COMP4010/5120 Teaching team review #2

Open tienvu95 opened 6 months ago

tienvu95 commented 6 months ago

Given that gifts_gender has 2 line, how are you going to visualize. For instance, I want to visualize how male and female spend differently on candy, how can I do that? If I use a bar chart → there will be 2 bars representing 2 numbers? This leads to the question, do we really need to visualize such data. Same thing for gifts_age.

If you use stacked barchart, what is the unit of y axis? Do you think stacked bar chart is a good option?

Questions 2, there is little data to work with as well. I’m not sure about the data on external factors but make sure to explain it details how you collect, prepoprocess and incorporate it into your analysis.

dangkh commented 6 months ago

We would like to thanks the reviewers for their insightful questions and critical ideas. In this letter of response, we would like to address the reviewers questions in a direct manner.

Question 1: Given that gifts_gender has 2 line, how are you going to visualize ?

Gifts_gender.csv can be visualized by using group bar chart. The R implementation is written as follows

# Load necessary libraries
library(ggplot2)
library(tidyr)
library(readr) # Assuming this might be needed for read_csv if the default read.csv isn't used

# Make sure to correctly read the dataset into 'data'
data <- read.csv("gifts_gender.csv") # Adjust path as needed

# Double-check the 'data' is correctly loaded
# head(data)

# Correctly apply pivot_longer to transform the data
data_long <- pivot_longer(data, cols = -Gender, names_to = "GiftType", values_to = "Percentage")

# Proceed with the rest of your plotting code
# Adjust the position_dodge width to increase space between the bars
dodge <- position_dodge(width = 0.5) # Use 'width' for horizontal chart logic

# Create the horizontal bar plot
ggplot(data_long, aes(y = GiftType, x = Percentage, fill = Gender)) +
  geom_bar(stat = "identity", position = dodge, width = 0.45) +
  geom_text(aes(label = sprintf("%.1f%%", Percentage), x = Percentage + 1), position = dodge, hjust = -0.1, size = 3.8) +
  scale_fill_manual(values = c("Men" = "red", "Women" = "blue"))+
  theme_minimal() +
  labs(title = "Average Percentage Spending on Gift Types by Gender",
       y = "Gift Type",
       x = "Average Percentage Spending",
       fill = "Gender") +
  theme(axis.text.y = element_text(angle = 0, hjust = 1))

This will result in the following visualization:

Picture1

Our result show that men spend 56% of their money on buying flower for celebrating Valentine day. On the other hands, women spend 59% of their money on buying candy.

Question 2: For instance, I want to visualize how male and female spend differently on candy, how can I do that?

The reviewer can use the same R code as in question 1.

Question 3: this leads to the question, do we really need to visualize such data ?

Yes, because the visualization of the data show that men spend most of their money (56%) on buying flower for celebrating Valentine day. On the other hands, women spend most of their their money (59%) on buying candy.

Question 4: Same thing for gifts_age. If you use stacked barchart, what is the unit of y axis?

For visualizing the dataset gift_age.csv, we do not use stacked bar chart. We use simple bar chart and group bar chart. Our visualization are as follows

Pictur2 Picture3 Picture4

Question 5: Do you think stacked bar chart is a good option?

The answer is no. Our group decided not to use stacked bar chart because of two reasons: Difficulty in Comparing Individual Segments: Stacked bar charts make it extremely difficult and irritating when we try to compare the size of individual segments across different bars, especially if the segments are not aligned at the same baseline. This is because the viewer's ability to accurately judge segment lengths is impaired when the segment starts at various different heights. Complexity with Many Categories: When there are many segments within each bar, the chart can become cluttered and overwhelming, making it difficult to extract meaningful insights. This complexity can lead to misinterpretation of the data.

Question 6: there is little data to work with as well. I’m not sure about the data on external factors but make sure to explain it details how you collect, prepoprocess and incorporate it into your analysis."

The data on the external factors that our group try to incorporate into the analysis is the start of the Covid-19 pandemic which is the year 2020. The source come from the official declaration of WHO at this link: https://www.who.int/director-general/speeches/detail/who-director-general-s-opening-remarks-at-the-media-briefing-on-covid-19---11-march-2020 This pandemic will surely affect the consumer behavior during Valentine days over the years due to lockdown. Our analysis show consistent drop in percentage of people celebrating Valentine day over the years, spending per person over the years, and the decline of spending on jewelry and eating out in the evening. Surprisingly, the yearly spending on gift cards increased.

Picture7