Open karen-liz opened 4 years ago
The link requires a sign-in to access the data. Could you attach the CSV here with a truncated list of rows?
Alternatively, and as a best practice, could you provide a sample of the data you wish to plot in tabular format, or be recreating a sample of the data? (Unless the problem is in extracting it, of course!).
For example:
x <- data.frame(some_variable = c(5, 10, 20, 50, 100),
another_variable = c("A", "C", "B", "A", "C"))
whatsgoodlydata10-which-social-media-millennials-care-about-most-QueryResult.xlsx
Okay, here's an xlsx file hoping this works!
Here's a screenshot if the xlsx doesn't work.
You could use function filter()
from package dplyr
, like so:
library(readxl)
library(dplyr)
file <- "2020-02-29_soc-media_data.xlsx"
dat <- read_excel(file); rm(file)
dat %>%
filter(segment_type == "Gender") %>%
select(segment_type:percentage)
# A tibble: 8 x 5
segment_type segment_description answer count percentage
<chr> <chr> <chr> <dbl> <dbl>
1 Gender Female respondents Instagram 1576 0.3
2 Gender Female respondents Facebook 644 0.122
3 Gender Female respondents Snapchat 2967 0.564
4 Gender Female respondents Linkedin 73 0.014
5 Gender Male respondents Instagram 1008 0.24
6 Gender Male respondents Facebook 565 0.135
7 Gender Male respondents Snapchat 2483 0.591
8 Gender Male respondents Linkedin 142 0.034
You could spruce it up a bit more with mutate()
and a find and replace using gsub()
:
dat %>%
filter(segment_type == "Gender") %>%
mutate(gender = gsub(x = segment_description,
pattern = " respondents",
replacement = "")) %>%
select(gender, answer:percentage)
# A tibble: 8 x 4
gender answer count percentage
<chr> <chr> <dbl> <dbl>
1 Female Instagram 1576 0.3
2 Female Facebook 644 0.122
3 Female Snapchat 2967 0.564
4 Female Linkedin 73 0.014
5 Male Instagram 1008 0.24
6 Male Facebook 565 0.135
7 Male Snapchat 2483 0.591
8 Male Linkedin 142 0.034
Now, this is in "tidy" format (Wickham, 2014) - which is great for plotting in packages like ggplot2
. However, you may need to untidy it for plotting with base R graphics
(or other reasons). You can untidy your data with function spread()
from package tidyr
.
library(tidyr)
dat %>%
filter(segment_type == "Gender") %>%
mutate(gender = gsub(x = segment_description,
pattern = " respondents",
replacement = "")) %>%
select(gender, answer:count) %>%
spread(key = gender, value = count)
# A tibble: 4 x 3
answer Female Male
<chr> <dbl> <dbl>
1 Facebook 644 565
2 Instagram 1576 1008
3 Linkedin 73 142
4 Snapchat 2967 2483
Because you have two categorical variables (answer
and gender
), as well as a continuous variable (count
), you'll probably want a stacked or grouped bar plot. Here's a great Stack Overflow post for very similar data and a few different solutions.
P.S. How I'd approach it in ggplot2
would be like so:
library(dplyr)
library(tidyr)
library(readxl)
library(ggplot2)
file <- "2020-02-29_soc-media_data.xlsx"
dat <- read_excel(file); rm(file)
dat %>%
filter(segment_type == "Gender") %>%
mutate(gender = gsub(x = segment_description,
pattern = " respondents",
replacement = "")) %>%
select(gender, answer:count) %>%
ggplot(aes(x = reorder(answer, -count),
y = count,
fill = gender)) +
geom_bar(stat = "identity") +
theme_minimal()
This gives you:
You can spruce that up a bit with some extra functions/arguments in ggplot2
and comma
from package scales
:
library(dplyr)
library(tidyr)
library(readxl)
library(scales)
library(ggplot2)
file <- "2020-02-29_soc-media_data.xlsx"
dat <- read_excel(file); rm(file)
dat %>%
filter(segment_type == "Gender") %>%
mutate(gender = gsub(x = segment_description,
pattern = " respondents",
replacement = "")) %>%
select(gender, answer:count) %>%
ggplot(aes(x = reorder(answer, -count),
y = count,
fill = gender)) +
geom_bar(stat = "identity",
alpha = 0.75) +
scale_y_continuous(labels = comma) +
labs(fill = "Gender",
x = "Preference",
y = "Respondents",
title = "Preferences by Platform & Gender",
subtitle = "9,458 Respondents",
caption = "Source: Data World") +
theme_minimal()
And that's my code-through for your code-through :). Hope this helps!
Jamison, I kept reading online that ggplot2 would be a great package for this but I couldn't figure it. Thank you this is was so much help and I can now do what I have been trying to do for the past day! Thanks again and have a wonderful weekend!
A core R version as well :-)
# recreate this table
answer Female Male
<chr> <dbl> <dbl>
1 Facebook 644 565
2 Instagram 1576 1008
3 Linkedin 73 142
4 Snapchat 2967 2483
t <- table( dat$segment_description, dat$answer )
# quick build of data for demo
segment <- c("Facebook","Instagram","Linkedin","Snapchat")
female <- c(644,1576,73,2967)
male <- c(565,1008,142,2483)
t <- rbind(female,male)
barplot( t, beside=T,
col=c("aquamarine3","coral"),
names.arg=segment )
legend( "topleft", c("female","male"), pch=15,
col=c("aquamarine3","coral"),
bty="n")
ggplot is a lot nicer, but the data steps can be more complicated at times.
@karen-liz you're very welcome! Glad this was helpful. I think it's a great example of how these packages interface quite nicely in the "Tidyverse" ecosystem (tidyr
, dplyr
, readxl
, and ggplot2
). You can pipe your data directly from the web and visualize it in a single expression!
@lecy Thank you for this! I was actually struggling with grouped bar plots in base R graphics
and I found an example that was very similar to yours - I just couldn't get the data layer right!
ggplot2
is nice but I'm sure you can make something really polished in graphics
. The NYT visualization has given me a new appreciation!
qplot()
in ggplot2
is easy to use and good for "quick and dirty" graphics but lacks customization options, so that's when you'd have to really learn the systems under the hood!
library(ggplot2)
qplot(data = tidy,
x = reorder(answer, -count),
y = count,
fill = gender,
xlab = "Respondents",
ylab = "Preferences",
main = "Preferences by Platform & Gender",
geom = "col") +
theme_minimal()
A core R version as well :-)
# recreate this table answer Female Male <chr> <dbl> <dbl> 1 Facebook 644 565 2 Instagram 1576 1008 3 Linkedin 73 142 4 Snapchat 2967 2483 t <- table( dat$segment_description, dat$answer )
# quick build of data for demo segment <- c("Facebook","Instagram","Linkedin","Snapchat") female <- c(644,1576,73,2967) male <- c(565,1008,142,2483) t <- rbind(female,male) barplot( t, beside=T, col=c("aquamarine3","coral"), names.arg=segment ) legend( "topleft", c("female","male"), pch=15, col=c("aquamarine3","coral"), bty="n")
ggplot is a lot nicer, but the data steps can be more complicated at times.
This is exactly what I was struggling with! Wow, I'm impressed that there are multiple ways to do this. Thank you both! This was extremely helpful!
We're just nerding out in public :).
P.S. Glad this helped! P.P.S. You can also plot it with 'googleVis', 'lattice', 'plotly', and a few other data viz packages!
Hi! I'm working on my Code Through assignment and I'm having a difficult time finding a way to plot certain variables. My dataset includes a variable of "Segment_description" which is a variety of descriptions such as "mobile respondents," "female respondents," and "Your parents make? $90K-$240k," and etc.
Here's the dataset: ( https://data.world/ahalps/which-social-media-millennials-care-about-most/workspace/query?queryid=sample-0 )
For this Code Through, I want to focus on gender so plotting the answers for "female respondents" and "male respondents." The problem I am having that I can't figure out how to plot the variable "Answer" on the x-axis and "Count" on the y-axis. The reason I am having an issue is that I can't figure out to extract these specific rows in these columns. I've tried using the operator "&" but for some reason, I can't figure out how to plot this. Any guidance would be greatly appreciated!