Watts-College / cpp-527-fall-2021

A course shell for CPP 527 Foundations of Data Science II
https://watts-college.github.io/cpp-527-fall-2021/
2 stars 6 forks source link

Final Project - Step 9 #72

Open sandralili opened 3 years ago

sandralili commented 3 years ago

Hello Dr. @lecy,

I created a function for Step 8 "create_salary_table ( ):

create_salary_table <- function ()
{
t.salary <- 
  d2 %>% 
  filter( ! is.na( title ) & title != "") %>% 
  group_by( title, gender ) %>% 
  summarize( q25=quantile(salary,0.25),
             q50=quantile(salary,0.50),
             q75=quantile(salary,0.75),
             n=n(), .groups = 'drop' ) %>% 
  ungroup() %>% 
  mutate( p= round( n/sum(n), 2) )
return(t.salary)
}

When I tried to call the function on Step 9, I have the following error:

d2 %>%  
  filter( Department.Description == "Psychology" ) %>% 
  create_salary_table( t.salary ) %>% 
  build_graph( unit="Psychology" )

_Error in create_salary_table(., t.salary) : unused argument (., t.salary)_

I appreciate any help, thanks.

lecy commented 3 years ago

Remember the recipe example for functions. If you are at the combine dry ingredients stage you need to pass flour, baking powder, and sugar to the function so it can be mixed. If you are at the bake cookies stage you need to pass “cookie dough”, a temperature, and a time to the function in order to complete the task.

You need to provide a function with all of the information it needs to complete the task.

Note you define no arguments here:

create_salary_table <- function ()
{…}

But you are passing two arguments here. t.salary as well as d2 passed through the pipe operator:

d2 %>%
filter( Department.Description == "Psychology" ) %>%
create_salary_table( t.salary )

What’s the minimal amount of information the create_salary_table() function would need to perform the calculation? If you don’t pass a data frame to the function as an argument it will pull whatever d2 is currently in the environment to create the table, or give an error if there is no d2 currently defined.

sandralili commented 3 years ago

Dr. @lecy, I missed adding the "ingredients" in the formula. Thanks

I added them, however, not sure if I am still missing something:

create_salary_table <- function (d2, title, gender, salary)

This is the error I am getting.

Error in pander(t.salary) : object 't.salary' not found

lecy commented 3 years ago

What is your code now? I'm guessing you left the pander() call inside the function?

Note that title, gender, and salary are all variables inside the data frame (after you add them on previous steps), so you only need to pass the data frame to the function:

create_salary_table <- function (d)
{
  t.salary <- 
    d %>% 
    filter( ! is.na( title ) & title != "") %>% 
    group_by( title, gender ) %>% 
    summarize( q25=quantile(salary,0.25),
               q50=quantile(salary,0.50),
               q75=quantile(salary,0.75),
               n=n() ) %>% 
    ungroup() %>% 
    mutate( p= round( n/sum(n), 2) )

  return(t.salary)
}
sandralili commented 3 years ago

No, Dr. @lecy, I didn't leave the pander () inside. It is so weird. (d7) is the database I'm using after dropping employees that don’t fit into the defined title categories (Step # 7)


# STEP 8

create_salary_table <- function (d7)
{
  t.salary <- 
    d7 %>% 
    filter( ! is.na( title ) & title != "") %>% 
    group_by( title, gender ) %>% 
    summarize( q25=quantile(salary,0.25),
               q50=quantile(salary,0.50),
               q75=quantile(salary,0.75),
               n=n() ) %>% 
    ungroup() %>% 
    mutate( p= round( n/sum(n), 2) )

  return(t.salary)
}

pander( t.salary )

Error in pander(t.salary) : object 't.salary' not found

The function is still not working. This is very weird, because the code is the same.

sandralili commented 3 years ago

Dr. @lecy, is working now: I was also missing the ".group" function:

n=n(), .groups = 'drop' ) %>%

I ran the code on a separate window and that's when I noticed it.

Thank you, thank you!!!

lecy commented 3 years ago

I'm not sure what the .groups='drop' argument is doing? There is already an ungroup() call in the function recipe.

You also need to assign the function results before you can print them. Because of rules of function scope the object t.salary will not exist outside of the function until you create it there:

create_salary_table <- function (d7)
{
  t.salary <- 
    d7 %>% 
    filter( ! is.na( title ) & title != "") %>% 
    group_by( title, gender ) %>% 
    summarize( q25=quantile(salary,0.25),
               q50=quantile(salary,0.50),
               q75=quantile(salary,0.75),
               n=n() ) %>% 
    ungroup() %>% 
    mutate( p= round( n/sum(n), 2) )

  return(t.salary)
}

pander( t.salary )  # will create a error 

t.salary <- create_salary_table(d7)
pander( t.salary )  # should work
sandralili commented 3 years ago

Dr. @lecy It works with your code, thanks. When I run the code, I got this error, but it does work.

summarise() has grouped output by 'title'. You can override using the .groups argument.

I had this error when I was working on the code before creating a function. It is so weird.

Thanks for your help and support!

lecy commented 3 years ago

That’s a warning message, not an error. They are meant to be informative or alert the user to unusual behavior but do not signal something was done wrong.

sandralili commented 3 years ago

Thanks!

aawoods97 commented 3 years ago

I am trying to test the code from Step 9, but I get an error regardless of what I put in as the first argument. The error reads: Error in build_graph(., t.salary, unit = "Psychology") : unused argument (t.salary)

And here is my code d %>% filter( Department.Description == "Psychology" ) %>% create_table() %>% build_graph( t.salary, unit="Psychology" )

lecy commented 3 years ago

You are piping the data frame forward through the chain of functions. All of these functions expect a data frame as the first argument:

filter( d, age > 40 )

You omit the data frame when you pipe:

d %>% filter( age > 40 )

You are currently passing two data frames to build_graph() because of piping.

d %>% filter( Department.Description == "Psychology" ) %>% create_table() %>% build_graph( t.salary, unit="Psychology" )
aawoods97 commented 3 years ago

That makes sense! I have changed the code to the following: d %>% filter( Department.Description == "Psychology" ) %>% create_salary_table() %>% build_graph( )

But now I am receiving this error: Error in UseMethod("filter") : no applicable method for 'filter' applied to an object of class "NULL"

Is this referring to the filter in the above code? Or is it related to the create_salary_table function?

lecy commented 3 years ago

Note:

# equivalent 
filter( d, age > 40 )
d %>% filter( age > 40 )

# equivalent 
build_graph( t.salary, unit="Psychology" )
t.salary %>% build_graph( unit="Psychology" )

You dropped your unit argument.

aawoods97 commented 3 years ago

By the unit argument do you mean 'unit = "Psychology"'? If so I added it back

d %>% filter( Department.Description == "Psychology" ) %>% create_salary_table() %>% build_graph(unit = "Psychology" )

However, I am receiving the same error: Error in UseMethod("filter") : no applicable method for 'filter' applied to an object of class "NULL"

aawoods97 commented 3 years ago

I found my error! I was using pander in the function prior