Watts-College / cpp-527-fall-2021

A course shell for CPP 527 Foundations of Data Science II
https://watts-college.github.io/cpp-527-fall-2021/
2 stars 6 forks source link

Final Project - Step 7 Error Message #79

Open voznyuky opened 2 years ago

voznyuky commented 2 years ago

@lecy I updated all my vectors to line up and flow correctly (d). All the steps work correctly, but when running step 7: d2 <- d %>% filter( title != "" ) %>% filter( Department.Description %in% academic.units ) %>% arrange( Department.Description, title )

nrow( d2 )

I get this message: Error: Problem with filter() input ..1. ℹ Input ..1 is title != "". x comparison (2) is possible only for atomic and list types Run rlang::last_error() to see where the error occurred.

I did some research and some were saying it might be a dplyr package, but it's definitely part of my library.

Any idea why this is not running correctly?

lecy commented 2 years ago

Does d contain title?

voznyuky commented 2 years ago

It does not. Need to check my codes to see where I messed up.

If I run code_titles(d) then it does show title but not if I just run d

If I do this: d <- code_titles(d) Then it works, but my total on step 7 is: [1] 2156

lecy commented 2 years ago

It's because of this step:

filter( title != ""  )

Which is correct. You will drop cases that do not have titles because they are university administration or staff and do not belong in the faculty salary report.

voznyuky commented 2 years ago

Ah! Thank you for the help!

lecy commented 2 years ago

You do not want to drop data in steps 2-3 during the merge because you are throwing out good data at that point just because a name was not in the names database or because the first name parser did not code the name correctly.

It's fine to filter data later on to eliminate observations that are not part of the study.

The main distinction is you are making the choice to filter data in the latter case, whereas you are likely dropping observations without realizing it in the former case.

bbmoren2 commented 2 years ago

Hello, I also got stuck at this step. I followed the same steps and got the same original error:

d2 <- 
  d %>% 
  filter( title != "" & ! is.na(title) ) %>% 
  filter( Department.Description %in% academic.units ) %>% 
  arrange( Department.Description, title )

Error: Problem with `filter()` input `..1`.
 Input `..1` is `title != "" & !is.na(title)`.
x comparison (2) is possible only for atomic and list types

I understand title is not a column in d and tried using the code_titles function but am getting another error:

d2 <- 
  code_titles(d) %>% 
  filter( title != "" & ! is.na(title) ) %>% 
  filter( Department.Description %in% academic.units ) %>% 
  arrange( Department.Description, title )

Error in UseMethod("filter") : 
  no applicable method for 'filter' applied to an object of class "factor"

I suspect I have been staring at my screen too long and am missing an obvious mistake... I would appreciate a push in the right direction!

lecy commented 2 years ago

@bbmoren2 what does your code_titles() function look like?

You might find this thread helpful: https://github.com/Watts-College/cpp-527-fall-2021/issues/68#issuecomment-939148158

I suspect you are doing the same thing that Asia was - sending a data frame to the function and returning the title only. Which is fine, but you then need to structure your data flow as follows:

d$title <- code_titles(d)

d2 <- 
  d %>% 
  filter( title != "" & ! is.na(title) ) %>% 
  filter( Department.Description %in% academic.units ) %>% 
  arrange( Department.Description, title )

It's a little more elegant to return the full data frame:

function( d )
{
  ...

  d$title <- factor( title )
  return( d )
}

d <- code_titles( d )

Then this would work (pipes are sending data frames forward at each step, not individual factors):

d2 <- 
  d %>% 
  code_titles() %>% 
  filter( title != "" & ! is.na(title) ) %>% 
  filter( Department.Description %in% academic.units ) %>% 
  arrange( Department.Description, title )

Or this:

d <- code_titles( d )
d2 <- 
  d %>% 
  filter( title != "" & ! is.na(title) ) %>% 
  filter( Department.Description %in% academic.units ) %>% 
  arrange( Department.Description, title )
bbmoren2 commented 2 years ago

Yup! That is exactly what I was doing.

Thank you for the push!!