Watts-College / cpp-527-fall-2021

A course shell for CPP 527 Foundations of Data Science II
https://watts-college.github.io/cpp-527-fall-2021/
2 stars 6 forks source link

Error-Batching #86

Open Jana-Ajeeb opened 2 years ago

Jana-Ajeeb commented 2 years ago

I was trying to knit salary-demo after doing the batch.R file this error appeared:


title: 'ASU Salary Report' output: html_document: theme: readable highlight: zenburn toc: true params: url: value: x


library( dplyr )
library( pander )
library( knitr )
library( gender )

source( 'utils.R' )   # load custom functions 

URL <- params$url     # load data 
d <- read.csv( URL )  

ERROR: Error in file(file, "rt") : cannot open the connection

lecy commented 2 years ago

Knit or render()?

You won’t be able to knit this file in R Studio because there is not data URL in the actual file. That gets passed through render().

Jana-Ajeeb commented 2 years ago

So i should run the code in the batch file right?

I tried to run this in the batch file it's giving this error:

## 2020 REPORT

url.2020 <- "https://docs.google.com/spreadsheets/d/1RoiO9bfpbXowprWdZrgtYXG9_WuK3NFemwlvDGdym7E/export?gid=1335284952&format=csv"
rmarkdown::render( input='salary-report.rmd', 
                   output_file = "ASU-2020-Salary-Report.HTML",
                   params = list( url = url.2020 ) )

Error:Quitting from lines 32-67 (salary-report.rmd) Error: Problem with filter() input ..1. i Input ..1 is !is.na(title) & title != "". x Input ..1 must be of size 12520 or 1, not size 51.

lecy commented 2 years ago

Yep, that's right. So what are lines 32-67 then?

Jana-Ajeeb commented 2 years ago

for( i in academic.units )
{

  cat( paste('<h1>', i ,'</h1>' ) )

  d %>%  
  filter( Department.Description == i ) %>% 
  create_salary_table(  ) %>% 
  build_graph( unit=i )

  cat( paste('<h3>', 'PAY RANGE BY RANK AND GENDER' ,'</h3>' ) )
    t.salary <- 
    d %>% 
    filter( Department.Description == i ) %>% 
    create_salary_table()
  cat( t.salary %>% knitr::kable(format="html") )

  cat( paste('<h3>', 'TOP 5 SALARIES' ,'</h3>' ) )
    topp <- 
    d %>% 
    filter( Department.Description == i ) %>% 
    top5()
  cat( topp %>% knitr::kable(format="html") )

  cat( '<br><hr><br>' )

}
lecy commented 2 years ago

Does your d contain title by the time it gets to the loop?

You still need the data steps in this template file:

  1. load
  2. add gender
  3. add title
  4. fix salary

You do these data steps with your sourced functions, though, and do not create any new functions in salary-report.rmd template.

Jana-Ajeeb commented 2 years ago

so should I add all the previous functions to the salary-report.rmd and not just the create_table and graph functions?

lecy commented 2 years ago

What do you mean by "add all of the functions"?

Use the functions in data steps to add all of the necessary variables to d prior to the loop? YES

Add code to create functions to the .rmd? NO

Jana-Ajeeb commented 2 years ago

Yes everything was added before Capture :

lecy commented 2 years ago

In data steps in the template?

  1. load
  2. add gender
  3. add title
  4. fix salary
d <- read.csv( ... )
d$first.name <- get_first_name( d$Full.Name )
d <- add_gender( d )
d <- code_titles( d )
d$salary <- fix_salary( d$Salary )

# start of loop 
Jana-Ajeeb commented 2 years ago

tried still the same :(

Ma112120 commented 2 years ago

Same here

lecy commented 2 years ago

This error makes me suspect you have two objects of different sizes and are trying to use both in the same function:

x Input ..1 must be of size 12520 or 1, not size 51

Open your rmd template, add a load data component - just make sure you remove it later. Then test all of your code.

URL <- 'https://docs.google.com/spreadsheets/d/1RoiO9bfpbXowprWdZrgtYXG9_WuK3NFemwlvDGdym7E/export?gid=1335284952&format=csv'
d <- read.csv( URL )

I suspect that in one of your functions you name the argument something like d and in the function code you reference d2.

do_something <- function( d )
{
  d2 %>% filter(...)
}
lecy commented 2 years ago

You got the error when running 2020 data?

You will want to add this to your loop before trying 2019 data to account for any missing departments:

for( i in academic.units )
{

  d2 <- filter( d, Department.Description == i )
  if( nrow(d2) == 0 ) { next }  
  # skips the rest of the code in the loop 
  # and start over with the next department
  ...

}