Watts-College / cpp-527-fall-2021

A course shell for CPP 527 Foundations of Data Science II
https://watts-college.github.io/cpp-527-fall-2021/
2 stars 6 forks source link

Final project: utils.R #68

Open Johaning opened 2 years ago

Johaning commented 2 years ago

I want to make sure I'm understanding this step of project. Is it right that we should write our functions in a file called utils.R instead of an .Rmd file like we normally do for labs?

I was getting confused looking for a complicated way to "store" the functions I've written in an .Rmd file, but realized that the above is probably what we're supposed to be doing.

lecy commented 2 years ago

Correct. If you want to see a good example of this, take a look at the Resume template you used for Lab 6.

The template rmd file sources the helper functions, then calls them throughout:

source( 'parsing_functions.R' )   # similar to our utils.R 
...
sanitize_links( intro_text )
position_data %>% print_section( 'education' )

You can find all of these functions in the parsing_functions.R script:

https://github.com/DS4PS/cv/blob/master/parsing_functions.R

If we were creating the five reports (one for each year of salary data) separately, we would need 5 RMD files and in each we would have to include all of the code for the custom functions we have written.

If you want to update one of the function you would have to update all of them in the separate documents. That might work if you are running 5 reports only, but not if you are running 100 reports or 1,000 reports.

Sourcing the custom functions means we only have one version of each function, and if we update it in utils.R then all of the reports will use the updated version as well because they load the functions from utils.R.

It's a nice trick as you are trying to scale projects. Things are easy to maintain if you only have one version to update, and its stored in a place that is easy to find.

It also makes the template file easier to use. If you want to switch the order of sections it's much easier to move sections that are compact because the data recipes have already been organized into functions instead of including the recipe steps directly in each section.

Education {data-icon=graduation-cap data-concise=true}
--------------------------------------------------------------------------------

```{r}
position_data %>% print_section('education')

Selected Writing {data-icon=newspaper}

position_data %>% print_section('writings')
Johaning commented 2 years ago

Thank you, that explanation is helping it all come together!

aawoods97 commented 2 years ago

I wrote my functions in the utils.R file, but when I am attempting to call the function 'code_titles' I receive the following error message: Error in code_titles(d) : could not find function "code_titles"

d <- code_titles(d)

lecy commented 2 years ago

Did you source the functions already?

source( "utils.R" )

After sourcing the functions you should see them in your environment:

ls()
 [1] "code_titles"     "d"        "d2"             

class( code_titles )
[1] "function"
aawoods97 commented 2 years ago

I did source the function however, I am receiving conflicting information. In the environment, it is listed as a function, but when I look at the class, it is listed as a factor. How can I go about resolving this?

Screen Shot 2021-10-08 at 5 23 57 PM Screen Shot 2021-10-08 at 5 23 51 PM

.

lecy commented 2 years ago

You are running the function and checking the class of the results.

Try:

class( code_titles )

# code_titles :  object name 
# code_titles( d )  :  new object that is returned from the function 

How are you building this function?

code_titles <- function( d )
{
  ...
  d$title <- factor( title, 
                   levels=c("Full Professor","Associate Professor",
                            "Assistant Professor","Teaching Faculty",
                            "Researcher" ) )
  return( ??? )
}

You can either return the full data frame d or the vector d$title. Looks like you are returning the vector?

aawoods97 commented 2 years ago

I have included my code below. After running the utils file and then the main file. I am still getting job descriptions such as 'Assoc Professor' that should have been replaced after running the function.

code_titles <- function(d)
{
  title <- rep( "", nrow(d) )
  title[ grepl( "^Asst Professor$", d$Job.Description ) ]  <- "Assistant Professor"
  title[ grepl( "^Assoc Professor$", d$Job.Description ) ] <- "Associate Professor"

  title[ grepl( "^Professor$", d$Job.Description ) ]            <- "Full Professor"
  title[ grepl( "Regents Professor", d$Job.Description ) ]      <- "Full Professor"
  title[ grepl( "President's Professor", d$Job.Description ) ]  <- "Full Professor"

  title[ grepl( "^Postdoctoral Research Scholar$", d$Job.Description ) ]   <- "Researcher"
  title[ grepl( "Research Specialist", d$Job.Description ) ]               <- "Researcher"
  title[ grepl( "Research Analyst", d$Job.Description ) ]                  <- "Researcher"
  title[ grepl( "Postdoctoral Scholar", d$Job.Description ) ]              <- "Researcher"
  title[ grepl( "Research Scientist", d$Job.Description ) ]                <- "Researcher"
  title[ grepl( "Research Professional", d$Job.Description ) ]             <- "Researcher"
  title[ grepl( "Research Professor", d$Job.Description ) ]                <- "Researcher"

  title[ grepl( "^Instructor$", d$Job.Description ) ]           <- "Teaching Faculty"
  title[ grepl( "Clinical .+ Professor", d$Job.Description ) ]  <- "Teaching Faculty"
  title[ grepl( "Lecturer$", d$Job.Description ) ]              <- "Teaching Faculty"
  title[ grepl( "^Lecturer Sr$", d$Job.Description ) ]          <- "Teaching Faculty"
  title[ grepl( "^Principal Lecturer$", d$Job.Description ) ]   <- "Teaching Faculty"
  title[ grepl( "Professor of Practice", d$Job.Description ) ]  <- "Teaching Faculty"

  d$title <- factor( title, 
                     levels=c("Full Professor","Associate Professor",
                              "Assistant Professor","Teaching Faculty",
                              "Researcher" ) )
 return(d$Job.Description)
}

This is how I am calling the function code_titles(d)

lecy commented 2 years ago

You are creating a new variable called title and storing it separately from Job.Description.

You currently don't return title from the function. You are just returning the original Job Description variable that has not been altered:

d$title <- factor( title )
return(d$Job.Description)

Here is a slightly easier way to structure the function - add title to the data frame, then return the full data frame.

function( d )
{
  ...

  d$title <- factor( title )
  return( d )
}

d <- code_titles( d )
aawoods97 commented 2 years ago

Thank you for the clarification! It worked