Watts-College / cpp-527-fall-2021

A course shell for CPP 527 Foundations of Data Science II
https://watts-college.github.io/cpp-527-fall-2021/
2 stars 6 forks source link

Final Project Batch File #83

Open ndavis4904 opened 2 years ago

ndavis4904 commented 2 years ago

One thing I noticed that caused a problem for me is that all other years than 2020 have no space between the ',' in the full name column and there is a box with text in it beyond the last column that I think does weird things.

The problem that I am having is getting an error message that says the following: "Error in plot.window(xlim = c(40000 - 10000, xmax), ylim = c(0, ymax + : need finite 'xlim' values"

Any ideas on what is going on?

lecy commented 2 years ago

It might be one of two things.

Either the name is not parsing correctly, which means the first name field is empty, which means there is no gender info, which means the graphs will run into problems because they are trying to use empty tables.

It's definitely better to use the comma as the split value instead of comma+space. You can always remove the space with trimws().


# test data for function development
x <- head( d$Full.Name, 25 )

# get second element from a vector v
return2 <- function(v){ return( v[2] ) }  

# drop the last name
x <- strsplit( x, "," )
x <- sapply( x, return2 )
x <- trimws( x )

# try this version to see why it doesn't work with 2019 data
x <- strsplit( x, ", " )

Or it could be an issue with salaries because of FTE normalization. Some years use a scale of 1 to 100, others a scale of 0 to 1.

I suggested adding this condition to your function so it can handle both cases.

if( max(d$FTE) == 100 )
{  salary <- d$salary / (d$FTE/100) }

if( max(d$FTE) == 1 )
{ salary <- d$salary / d$FTE }