Watts-College / paf-513-template

https://watts-college.github.io/paf-513-template/
MIT License
0 stars 0 forks source link

Lab 5 - Part 2 - Question 1: #18

Open swest235 opened 6 months ago

swest235 commented 6 months ago

dat %>% select(age, hour12)%>% group_by( Time = hour12,Age.Group = age) %>% summarize(Accidents = n()) pander()

@JasonSills I'm using the above code and technically able to use this table to find the answer, but the table is not clean and organized so lots of searching to find the 7 am windows. How can I clean this up? I imagine it is a simple fix I should know but I'm having a hard time cracking it. Thank you in advance!

image

JasonSills commented 6 months ago

@swest235 We can simplify your code using the count verb. Let's start with a basic table.

dat %>% 
  count(hour, age) %>%
  head()

If we then add in a filter and arrange by descending order we get the table you want:


dat %>% 
  filter(hour == "07") %>%
  count(hour, age) %>%
  arrange(desc(n))

Note that we filter first, count the columns we are interested in, and then arrange in descending order.

swest235 commented 6 months ago

@JasonSills Why does using hour == "07" and hour12 == "7 AM" generate different counts for accidents?

I have the three following codes with their respective outputs. In the second code chunk, when we use hour, instead of hour12, we get very different counts for the 7am accidents per age group. When I switch the second code to use hour12 it generates different results.

hour == "07" and hour12=="7 AM" should be the same, shouldn't they?

Which is correct?

image

image

image

JasonSills commented 6 months ago

@swest235 The two with hour12 == "7 AM" appear to be the same, but arranged differently.

I'm interested in the difference in hour and hour12. I'm going to see if @lecy has any insights. When I run:

dat%>%
  select(DateTime, hour, hour12)

It looks like DateTime and hour do not align with hour12: image

Is this a known bug?

dongdongkim99 commented 6 months ago

@JasonSills @swest235 I also get different results for 'hour' and 'hour12'. It seems like it could be a system time or location settings issue? I think I have to use 'hour'.

image

image

JasonSills commented 6 months ago

@dongdongkim99 and @swest235

I have identified the issue. Note that from your instructions you need to update your data preprocessing.

From the instructions:

# set the levels so they are in the correct order
time.levels <-
  c( "12 AM", " 1 AM", " 2 AM", " 3 AM", " 4 AM", " 5 AM", 
     " 6 AM", " 7 AM", " 8 AM", " 9 AM", "10 AM", "11 AM", 
     "12 PM", " 1 PM", " 2 PM", " 3 PM", " 4 PM", " 5 PM", 
     " 6 PM", " 7 PM", " 8 PM", " 9 PM", "10 PM", "11 PM" )

dat$hour12 <- factor( dat$hour12, levels=time.levels )
table( dat$hour12 ) %>% head() %>% pander()

But in your lab rmd file you are provided with:

time.levels <- c("12 AM", paste(1:11, "AM"), 
                 "12 PM", paste(1:11, "PM"))

You will see how to do this, step by step, in your solutions if you are unable to update for your submissions.

dongdongkim99 commented 6 months ago

@JasonSills Do you mean the two below are different?

time.levels <- c("12 AM", paste(1:11, "AM"), "12 PM", paste(1:11, "PM"))

time.levels <- c( "12 AM", " 1 AM", " 2 AM", " 3 AM", " 4 AM", " 5 AM", " 6 AM", " 7 AM", " 8 AM", " 9 AM", "10 AM", "11 AM", "12 PM", " 1 PM", " 2 PM", " 3 PM", " 4 PM", " 5 PM", " 6 PM", " 7 PM", " 8 PM", " 9 PM", "10 PM", "11 PM" )

I change the above to the below. but the issue was not solved. T_T

JasonSills commented 6 months ago

@dongdongkim99 You will have the solutions tomorrow, please move forward with your best effort.

dongdongkim99 commented 6 months ago

@JasonSills oh I understand. I'll try my best.