Watts-College / cpp-527-spr-2022

https://watts-college.github.io/cpp-527-spr-2022/
0 stars 1 forks source link

Final Project: Should we use salary or full-time salary? #35

Open chmao99 opened 2 years ago

chmao99 commented 2 years ago

In final project Part I step 6, we convert salary to full-time salary. However, I believe that the table in step 8 was generated by original salary data . Should we use full-time salary for our further steps?

jacobtnyoung commented 2 years ago

Hi @chmao99, it looks like the table uses the full-time salary.

The instructions page shows that the varlable salary is created to reflect full-time:

salary <- d$salary / (d$FTE / 100)

Then, in the code for the table, it makes a call to salary, (lowercase "s"):

t.salary <- 
  d %>% 
  filter( ! is.na( title ) & title != "" ) %>% 
  group_by( title, gender ) %>% 
  summarize( q25=quantile(salary,0.25),
             q50=quantile(salary,0.50),
             q75=quantile(salary,0.75),
             n=n() ) %>% 
  ungroup() %>% 
  mutate( p= round( n/sum(n), 2) )

pander( t.salary )

That is what I used in my code for the project.

chmao99 commented 2 years ago

Thanks for your quick reply! However, may "salary" in summarize() be "d$salary" since you are in the pipeline of "d"? I added a salary.fulltime variable in "d", and got exactly the same data table when I used original salary but a little bit different one when using salary.fulltime.

jacobtnyoung commented 2 years ago

Does your salary.fulltime variable match salary <- d$salary / (d$FTE / 100)?

chmao99 commented 2 years ago

Yes, this is my code. if( max(d$FTE) == 100 ) { salary <- d$Salary / (d$FTE/100) } if( max(d$FTE) == 1 ) { salary <- d$Salary / d$FTE } d <- cbind(d, Salary.Fulltime = salary)

And in the function of "create_salary_table() ", when I use original salary data like this

summarize( q25=quantile(Salary,0.25), q50=quantile(Salary,0.50), q75=quantile(Salary,0.75), n=n() ) %>%

I got exactly the same table in your instruction page. I also tried "salary" like yours, and change "Salary" to "salary" in the function. It returns the table below.


    title          gender     q25     q50     q75      n     p   

Full Professor male 57464 90561 117731 338 0.14

Full Professor female 57464 90561 117731 141 0.06

Full Professor uncoded 57464 90561 117731 56 0.02

Associate Professor male 57464 90561 117731 229 0.09

Associate Professor female 57464 90561 117731 180 0.07

Associate Professor uncoded 57464 90561 117731 52 0.02

Assistant Professor male 57464 90561 117731 147 0.06

Assistant Professor female 57464 90561 117731 141 0.06

Assistant Professor uncoded 57464 90561 117731 66 0.03

Teaching Faculty male 57464 90561 117731 319 0.13

Teaching Faculty female 57464 90561 117731 378 0.15

Teaching Faculty uncoded 57464 90561 117731 57 0.02

 Researcher         male     57464   90561   117731   169   0.07 

 Researcher        female    57464   90561   117731   114   0.05 

 Researcher        uncoded   57464   90561   117731   72    0.03 

To be honestly, I can not fully understand your code, salary is just an independent vector?

jacobtnyoung commented 2 years ago

Hi @chmao99!

Yes, I just pulled the code from the instructions page. This salary <- d$salary / (d$FTE / 100) just returns a vector.

In your code, it should work as expected. You could also try this:


if( max(d$FTE) == 100 ) 
{ salary <- d$Salary / (d$FTE/100) } 

if( max(d$FTE) == 1 ) 
{ salary <- d$Salary / d$FTE } 

#d <- cbind(d, Salary.Fulltime = salary)
d$salary <- salary

Note how I did the last line differently.

Hope that helps!

chmao99 commented 2 years ago

Thanks a lot for your nice reply!

However, I still think that tables and graph in Part 1 of the instruction paper generated from original salary. In step 10, Prescott, Edward, the person who rank No. 5, whose original salaries is $340,159. I double checked it from the original dataset as follows. And it is the original salary. Since his FTE is 78, he should ranked No. 1 in the full-time salary data-set.

` which(d$Full.Name == "Prescott, Edward") [1] 8989

d[8989,] Calendar.Year Full.Name Job.Description Department.Description 8989 2020 Prescott, Edward Regents Professor WPC Economics Salary FTE 8989 $340,159.00 78`