Open AhmedRashwanASU opened 2 years ago
Which year are you looking at specifically?
2019 - 2018 - 2017- 2016
You can drop the note (column X). It would be better to specify which columns to keep because that would be consistent across years, but for demo purposes:
> URL <- 'https://docs.google.com/spreadsheets/d/1RoiO9bfpbXowprWdZrgtYXG9_WuK3NFemwlvDGdym7E/export?gid=1948400967&format=csv'
> d <- read.csv( URL )
> d2 <- dplyr::select( d, -X )
>
> head(d)
Calendar.Year Full.Name Job.Description
1 2019 Abadjivor,Enyah Coordinator
2 2019 Abbas,James Assoc Professor
3 2019 Abbaszadegan,Morteza Professor
4 2019 Abbe,Scott Tech Support Analyst Coord
5 2019 Abbl,Norma Sr HR Consultant
6 2019 Abbott,David Assoc Professor
Department.Description Salary FTE
1 Research Division 2 Tempe $45,195.00 100
2 Sch Biological & Hlth Sys Engr $101,795.00 100
3 Sch Sustain Engr & Built Envrn $143,625.00 100
4 Engineering Technical Services $95,560.00 100
5 HR Partners $86,806.00 100
6 Shesc $86,188.00 100
X
1 NOTE: This data is public record salary data of Arizona State University employees, compiled by The State Press. Last updated Dec. 5, 2017. View the numbers in a searchable database at http://www.statepress.com/article/2017/04/spinvestigative-salary-database
2
3
4
5
6
>
> head(d2)
Calendar.Year Full.Name Job.Description
1 2019 Abadjivor,Enyah Coordinator
2 2019 Abbas,James Assoc Professor
3 2019 Abbaszadegan,Morteza Professor
4 2019 Abbe,Scott Tech Support Analyst Coord
5 2019 Abbl,Norma Sr HR Consultant
6 2019 Abbott,David Assoc Professor
Department.Description Salary FTE
1 Research Division 2 Tempe $45,195.00 100
2 Sch Biological & Hlth Sys Engr $101,795.00 100
3 Sch Sustain Engr & Built Envrn $143,625.00 100
4 Engineering Technical Services $95,560.00 100
5 HR Partners $86,806.00 100
6 Shesc $86,188.00 100
Full-Time Equivalency is scaled differently, max value of either 1 or 100.
You might need to add some conditionality to your normalization function.
if( max(FTE) == 100 )
{ salary <- salary / (FTE/100) }
if( max(FTE) == 1 )
{ salary <- salary / FTE }
Does that make sense to you?
Yup , will apply the same , thanks prof
The joy of being a data analyst is that the world is marching toward entropy and your job is to create order and meaning from the chaos ;-)
Just to confirm that this function worked on 2020 data, however, returning null on 2019 Data, any idea? note that the below code is only to test the main function.
name.first <- sapply(strsplit(d2$Full.Name, " "), `[`, 2)
head(name.first)
[1] NA NA NA NA NA NA
2020 Data returns below
"Mohammad" "Jose" "Kelsea" "Enyah" "Precious" "James"
@lecy
Can you see what changed between 2019 and 2020? What should you use as the delimiter instead of a space?
### 2019 DATA
[65] "Adelman,Madelaine" "Adler,Patricia"
[67] "Adrian,Ronald" "Adusumilli,Sesha Chandra"
[69] "Afanador Pujol,Angelica" "Affolter,Jacob"
[71] "Afsari Mamaghani,Sepideh" "Aganaba,Timiebi"
### 2020 DATA
[1] "ABBASI, Mohammad" "ARQUIZA, Jose Maria Reynaldo Apollo"
[3] "Aaberg, Kelsea" "Abadjivor, Enyah"
[5] "Abayesu, Precious"
Note that your heuristic above will fail when there are two last names:
"Afanador Pujol, Angelica"
It also won't return a single first name when the string includes middle names:
"ARQUIZA, Jose Maria Reynaldo Apollo"
See some hints here: https://github.com/Watts-College/cpp-527-fall-2021/issues/67#issuecomment-937135609
Just make sure you don't leave a space in front of the first name or the gender package will fail to match the name.
" Jose" # no return value from gender package when there is a leading space
@lecy I am still getting an error message when I try to run the graph with 2019 data but I think I've figured out where it might be coming from.I am not able to generate the graph for some units in 2019.
build_graph( t.salary, unit="Ldrshp and Integrative Studies" ) ## Does not run for 2019
It looks like this Department.Description does not exist in the 2019 dataset so when I run through the academic.units provided, it is searching for something that doesn't exist. I'm not sure how to adapt the academic.units for each year since we only want a subset of the department descriptions.
2020:
2019:
Here's a great use of control structures.
Rule: if the academic unit does not exist in the dataset then skip it:
for( i in academic.units )
{
d2 <- filter( d, Department.Description == i )
if ( nrow(d2) == 0 ) { next } # skips the rest of the code in the loop for this department
...
}
Thank you, Dr. Lecy! That was the final piece of the puzzle.
Thanks for identifying the problem department.
I added that tip to the instructions for others as well.
@lecy Not sure if loading the link of 2019 data with edit URL, will run, apart from the export link
##### BATCH.R FILE
## 2020 REPORT
url.2020 <- "https://docs.google.com/spreadsheets/d/1RoiO9bfpbXowprWdZrgtYXG9_WuK3NFemwlvDGdym7E**/export?**gid=1335284952&format=csv"
rmarkdown::render( input='salary-report.rmd',
output_file = "ASU-2020-Salary-Report.HTML",
params = list( url = url.2020 ) )
## 2019 REPORT
url.2019 <- "https://docs.google.com/spreadsheets/d/1RoiO9bfpbXowprWdZrgtYXG9_WuK3NFemwlvDGdym7E/**edit#**gid=1948400967"
rmarkdown::render( input='salary-report.rmd',
output_file = "ASU-2019-Salary-Report.HTML",
params = list( url = url.2019 ) )
No, it needs to be converted to the same format as the 2020 data (export CSV version):
> url.2019 <- "https://docs.google.com/spreadsheets/d/1RoiO9bfpbXowprWdZrgtYXG9_WuK3NFemwlvDGdym7E/**edit#**gid=1948400967"
> d <- read.csv( "url.2019" )
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
cannot open file 'url.2019': No such file or directory
Good day Prof ,
please note that in other salaries years files there are added column X , with some notes, as well as the FTE Calculation, is different than 2020 data, after assigning column X to null in order to delete X , running the first function of getting the first names will return null results .
any idea how to solve the same ?