Watts-College / paf-514-template

https://watts-college.github.io/paf-514-template/
1 stars 0 forks source link

Final Project- Setting up the files #52

Closed jasminacosta closed 2 months ago

jasminacosta commented 9 months ago

Hi @lecy!

Before starting the final project steps I am having issues loading the files to make sure they work correctly.

Note: I did download Pandoc and update the Rmarkdown package.

Screenshot 2024-02-09 at 7 09 01 PM Screenshot 2024-02-09 at 7 09 31 PM Screenshot 2024-02-09 at 7 10 10 PM
lecy commented 9 months ago

This is a new issue caused by changes to GitHub. For some reason when you download the RMD file from GitHub it is stripping away the yaml header:

---
title: "Batch Report Demo"
output:
  html_document:
    theme: readable
    highlight: zenburn
    toc: true
params:
  url:
    value: x  
---

After you add that back to your RMD doc it should work. It should look like this:

https://github.com/Watts-College/paf-514-template/blob/main/labs/batch-demo/salary-report.rmd

jasminacosta commented 9 months ago

@lecy I added the yaml header but it is still not running the RMD doc.

Screenshot 2024-02-11 at 5 46 33 PM
lecy commented 9 months ago

I ran it today and it worked so I know the code is ok.

Try it again. Might be a bad connection.

You can test the data load step like this:

url.2020 <- "https://docs.google.com/spreadsheets/d/1RoiO9bfpbXowprWdZrgtYXG9_WuK3NFemwlvDGdym7E/export?gid=1335284952&format=csv"
d <- read.csv( url.2020 )
jasminacosta commented 9 months ago

I tried the following code provided and it did run properly.

Should I replace this part of the code and replace it with the new code provided?


# LOAD DATA 
URL <- params$url         #replace this part of the code 
d <- read.csv( URL )

Replace with the following code:


url.2020 <- "https://docs.google.com/spreadsheets/d/1RoiO9bfpbXowprWdZrgtYXG9_WuK3NFemwlvDGdym7E/export?gid=1335284952&format=csv"
d <- read.csv( url.2020 )        #use this part of code instead?
lecy commented 9 months ago

For testing purposes that's fine. If you replace the current code in the RMD doc, though, you are hard-coding a single year of data into your RMD template, and it can no longer be used as part of a batch process to create reports across multiple years.

You don't actually knit the RMD file in this case. It is a template that you execute through the batch file:

Before you go on to next steps, download these three files do your local working directory. Open batch.R in a regular R console and try running the reports to create the HTML files.

# utils.R and salary-report.rmd should be in the working directory

## 2020 REPORT
url.2020 <- "https://docs.google.com/spreadsheets/d/1RoiO9bfpbXowprWdZrgtYXG9_WuK3NFemwlvDGdym7E/export?gid=1335284952&format=csv"
rmarkdown::render( input='salary-report.rmd', 
                   output_file = "ASU-2020-Salary-Report.HTML",
                   params = list( url = url.2020 ) )

It will make more sense after you finish Lab-06 and use the Resume template.

jasminacosta commented 9 months ago

@lecy So is it fine for me to start working on? Despite these errors popping up?

lecy commented 9 months ago

You shouldn't be getting errors if you are running it correctly.

My question is whether you are executing as described in the assignment?

"Before you go on to next steps, download these three files do your local working directory. Open batch.R in a regular R console and try running the reports to create the HTML files."

# utils.R and salary-report.rmd should be in the working directory

## 2020 REPORT
url.2020 <- "https://docs.google.com/spreadsheets/d/1RoiO9bfpbXowprWdZrgtYXG9_WuK3NFemwlvDGdym7E/export?gid=1335284952&format=csv"
rmarkdown::render( input='salary-report.rmd', 
                   output_file = "ASU-2020-Salary-Report.HTML",
                   params = list( url = url.2020 ) )

Are you running this rmarkdown::render() command from an R console with utils.R and salary-report.rmd saved in your current working directory?

jasminacosta commented 9 months ago

@lecy I used getwd() to see what the directory is for each file and these are the results.

For utilis.R the directory is:


> getwd()
[1] "/Users/jasminacosta/montyhall"

For salary-report.rmd it is:


> getwd()
[1] "/Users/jasminacosta/Downloads"

Then for batch.R it is:


> getwd()
[1] "/Users/jasminacosta/montyhall"

However, I saw that all my files were in my downloads folder.

I am really confused.

lecy commented 9 months ago

If all of your files are in the downloads folder, you should be able to knit the report as follows:

setwd( "/Users/jasminacosta/Downloads" )

## 2020 REPORT
url.2020 <- "https://docs.google.com/spreadsheets/d/1RoiO9bfpbXowprWdZrgtYXG9_WuK3NFemwlvDGdym7E/export?gid=1335284952&format=csv"
rmarkdown::render( input='salary-report.rmd', 
                   output_file = "ASU-2020-Salary-Report.HTML",
                   params = list( url = url.2020 ) )

Try that and let me know if it works.

jasminacosta commented 9 months ago

@lecy It seemed to work for the batch.R and utilis.R, but I still having issues with the salary-report.rmd.

Screenshot 2024-02-12 at 5 19 07 PM Screenshot 2024-02-12 at 5 19 40 PM
lecy commented 9 months ago

You can't always knit a template RMD directly in RStudio. You would knit using the command in batch.R:

setwd( "/Users/jasminacosta/Downloads" )

## 2020 REPORT
url.2020 <- "https://docs.google.com/spreadsheets/d/1RoiO9bfpbXowprWdZrgtYXG9_WuK3NFemwlvDGdym7E/export?gid=1335284952&format=csv"
rmarkdown::render( input='salary-report.rmd', 
                   output_file = "ASU-2020-Salary-Report.HTML",
                   params = list( url = url.2020 ) )

When you run this do you create an HTML file called "ASU-2020-Salary-Report.HTML"?

jasminacosta commented 9 months ago

@lecy Yes, I believe so.

Screenshot 2024-02-13 at 3 24 24 PM Screenshot 2024-02-13 at 3 24 54 PM
lecy commented 9 months ago

Check your file list to confirm:

dir()

Or else open your downloads folder and look for the file.

You can also open files from R with shell():

# assuming you are in the right wd
shell( "ASU-2020-Salary-Report.HTML" )
jasminacosta commented 9 months ago

@lecy This is the output I am getting and I am not sure why I am do getting the directory dir()

Screenshot 2024-02-13 at 4 18 34 PM
lecy commented 9 months ago

Selection: means it's waiting for an instruction while trying to install programs. You need to install genderdata before you can run the file.

library( gender )
gender("sara")

The genderdata package needs to be installed.
Install the genderdata package? 
1: Yes
2: No

Selection: 
lecy commented 9 months ago

Also a good example of why to share code instead of screen shots. In the base R console your code would look like this:

setwd("ds2")
url.2020 <- "https://docs.google.com/spreadsheets/d/1RoiO9bfpbXowprWdZrgtYXG9_WuK3NFemwlvDGdym7E/export?gid=1335284952&format=csv"
rmarkdown::render( input='salary-report.rmd', 
                   output_file = "ASU-2020-Salary-Report.HTML",
                   params = list( url = url.2020 ) )

processing file: salary-report.rmd
  |                                                                             |   0%   
  |...........                                                                  |  14%                   
  |......................                                                       |  29% [setup]             
  |.................................                                            |  43%                      
  |............................................                                 |  57% [unnamed-chunk-1]Selection:                     

That would give a clue about where the script is getting stuck and why.

Your screen shot is not reproducible - it doesn't show what code was run and which behavior it produced.

[unnamed-chunk-1]  # where the problem occurs
Selection:         # which message you are receiving 
jasminacosta commented 9 months ago

@lecy Understood.

I installed gender data and tried to run the code in the R console.


 setwd( "/Users/jasminacosta/Downloads" )
## 2020 REPORT
url.2020 <- "https://docs.google.com/spreadsheets/d/1RoiO9bfpbXowprWdZrgtYXG9_WuK3NFemwlvDGdym7E/export?gid=1335284952&format=csv"
rmarkdown::render( input='salary-report.rmd',
                    output_file = "ASU-2020-Salary-Report.HTML",
                    params = list( url = url.2020 ) )

And I am getting this error:


processing file: salary-report.rmd
  |..................................................        |  86% [unnamed-chunk-2]
Quitting from lines 80-100 [unnamed-chunk-2] (salary-report.rmd)
Error in `group_by()`:
! Must group by variables found in `.data`.
Column `title` is not found.
Column `gender` is not found.
Backtrace:
 1. ... %>% mutate(p = round(n / sum(n), 2))
 6. dplyr:::group_by.data.frame(., title, gender)
lecy commented 9 months ago

Try closing R Studio and executing from a base R console, please.

jasminacosta commented 9 months ago

@lecy I closed R studio and executed the code from the R console and I am still getting the same error.

lecy commented 9 months ago

Your data is not loading correctly but I'm not sure if it is a problem with your package setup, a problem sourcing the utils.R file because of directory changes, or maybe a connection issue with Google Sheets.

Can you please try the following and tell me what you get:

library( dplyr )
library( pander )
library( knitr )
library( gender )
source( "utils.R" )

URL <- "https://raw.githubusercontent.com/Watts-College/paf-514-template/main/labs/batch-demo/asu-salaries-2020.csv"
d <- read.csv( URL )
d$first.name <- get_first_name( d$Full.Name )
d <- add_gender( d )
d <- add_titles( d )
d <- fix_salary( d )

d <-      
  d %>% 
  filter( title != "" & ! is.na(title) ) %>% 
  filter( Department.Description %in% academic.units )
head(d) %>% pander::pander()
jasminacosta commented 9 months ago

@lecy I got the following when I ran it through the R console.


----------------------------------------------------------------
 first.name   Calendar.Year      Full.Name      Job.Description 
------------ --------------- ----------------- -----------------
   Aaron          2020         Baker, Aaron        Professor    

   Aaron          2020        Fellmeth, Aaron      Professor    

   Aaron          2020         Redman, Aaron      Instructor    

   Aaron          2020        Crippen, Aaron      Instructor    

   Aaron          2020          Bae, Aaron         Lecturer     

   Aaron          2020         Romans, Aaron      Instructor    
----------------------------------------------------------------

Table: Table continues below

------------------------------------------------------------------------------
    Department.Description        Salary      FTE   gender        title       
------------------------------ ------------- ----- -------- ------------------
           English              $107,160.00   100    male     Full Professor  

        College Of Law          $164,755.00   100    male     Full Professor  

  SOS Faculty & Researchers     $50,000.00    100    male    Teaching Faculty 

           English              $52,600.00    100    male    Teaching Faculty 

  School of Social Transform    $52,750.00    100    male    Teaching Faculty 

 Social & Behavioral Sciences   $47,979.00    80     male    Teaching Faculty 
------------------------------------------------------------------------------

Table: Table continues below

--------
 salary 
--------
 107160 

 164755 

 50000  

 52600  

 52750  

 59974  
--------
lecy commented 9 months ago

Ok now add:

t.salary <- 
  d %>% 
  group_by( title, gender ) %>% 
  summarize( q25=quantile(salary,0.25),
             q50=quantile(salary,0.50),
             q75=quantile(salary,0.75),
             n=n() ) %>% 
  ungroup() %>% 
  mutate( p= round( n/sum(n), 2) )

t.salary %>% build_graph( unit="ALL ASU")
jasminacosta commented 9 months ago

@lecy I ran that also into the R console and received these results:


> t.salary <- 
+     d %>% 
+     group_by( title, gender ) %>% 
+     summarize( q25=quantile(salary,0.25),
+                q50=quantile(salary,0.50),
+                q75=quantile(salary,0.75),
+                n=n() ) %>% 
+     ungroup() %>% 
+     mutate( p= round( n/sum(n), 2) )
`summarise()` has grouped output by 'title'. You can override using the `.groups`
argument.
> 
> t.salary %>% build_graph( unit="ALL ASU")
NULL

Along with a graph of salaries.

Screenshot 2024-02-13 at 6 39 28 PM
lecy commented 9 months ago

Last thing to test, then:

URL <- "https://docs.google.com/spreadsheets/d/1RoiO9bfpbXowprWdZrgtYXG9_WuK3NFemwlvDGdym7E/export?gid=1335284952&format=csv"
d <- read.csv( URL )

If that works then I'm guessing if you shut down R and open it again then try once more with a fresh console it will work. It could have been a ghost in the machine from other files you had open.

Otherwise I am stumped because it works on my computer and all of the steps above work fine. If you still get an error please email me your RMD and util.R files.

(I see that you are still running your files from R Studio, not a base R console. I don't think that would be the problem, though. Let's see if it works.)

jasminacosta commented 9 months ago

@lecy I ran that as well in the R console and it worked fine.

I shut down R and ran the code to see if the salary-report.rmd would run properly, but I am still getting the same error.

Let me send over my files.

lecy commented 9 months ago

Check your email - you added extra code to the salary-template.rmd file that was overwriting the prior data steps. If you use the original version it should work fine.

jasminacosta commented 9 months ago

@lecy Oh, I see!

I removed those lines and when I ran the code:


setwd( "/Users/jasminacosta/Downloads" )
## 2020 REPORT
url.2020 <- "https://docs.google.com/spreadsheets/d/1RoiO9bfpbXowprWdZrgtYXG9_WuK3NFemwlvDGdym7E/export?gid=1335284952&format=csv"
rmarkdown::render( input='salary-report.rmd',
                   output_file = "ASU-2020-Salary-Report.HTML",
                   params = list( url = url.2020 ) )

I got the output


processing file: salary-report.rmd

output file: salary-report.knit.md

/usr/local/bin/pandoc +RTS -K512m -RTS salary-report.knit.md --to html4 --from markdown+autolink_bare_uris+tex_math_single_backslash --output ASU-2020-Salary-Report.HTML --lua-filter /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/rmarkdown/rmarkdown/lua/pagebreak.lua --lua-filter /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/rmarkdown/rmarkdown/lua/latex-div.lua --embed-resources --standalone --variable bs3=TRUE --section-divs --table-of-contents --toc-depth 3 --template /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library/rmarkdown/rmd/h/default.html --highlight-style zenburn --variable theme=readable --mathjax --variable 'mathjax-url=https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML' --include-in-header /var/folders/tw/fh99fdy16cg1d8ppmstjmp5w0000gn/T//RtmpI0n6uY/rmarkdown-str121c72b99f561.html 

Output created: ASU-2020-Salary-Report.HTML

Then I went to knit the salary-report.rmd and I got the error:

Not sure if that is what is supposed to happen.


processing file: salary-report.rmd

Quitting from lines 30-66 [unnamed-chunk-1] (salary-report.rmd)

Error in `file()`:
! cannot open the connection
Backtrace:
 1. utils::read.csv(URL)
 2. utils::read.table(...)
 3. base::file(file, "rt")
Execution halted
lecy commented 9 months ago
Output created: ASU-2020-Salary-Report.HTML

Success! You can preview the HTML file with:

shell( "ASU-2020-Salary-Report.HTML" )

This is how you are knitting your report template:

rmarkdown::render( input='salary-report.rmd', ... )

When you are creating batches of dozens or hundreds of reports you would not want a separate RMD file for each report that you have to open in R Studio and knit manually.

The steps above are to test your environment - ensure the packages are installed and working correctly, that you can find your files in your project directory, that the rmarkdown package can call pandoc ok, etc. This step was designed to identify project configuration errors before you start implementing the steps on your own.

jasminacosta commented 9 months ago

@lecy Yay!

Got it, that makes sense.

In that case would I be able to begin the project, or do I need to run that code and it needs to run properly?


rmarkdown::render( input='salary-report.rmd' )

Because when I ran that code to see what would happen I got the error:


processing file: salary-report.rmd
  |.................................                         |  57% [unnamed-chunk-1]
Quitting from lines 30-66 [unnamed-chunk-1] (salary-report.rmd)
Error in `file()`:
! cannot open the connection
Backtrace:
 1. utils::read.csv(URL)
 2. utils::read.table(...)
 3. base::file(file, "rt")

Just want to make I am setting up my files correctly before I start adding code to it!

lecy commented 9 months ago

The dots were for the omitted parts:

## 2020 REPORT
url.2020 <- "https://docs.google.com/spreadsheets/d/1RoiO9bfpbXowprWdZrgtYXG9_WuK3NFemwlvDGdym7E/export?gid=1335284952&format=csv"
rmarkdown::render( input='salary-report.rmd',
                   output_file = "ASU-2020-Salary-Report.HTML",
                   params = list( url = url.2020 ) )

You should be all set.

jasminacosta commented 9 months ago

@lecy Understood.

Thank you for all your help!