Watts-College / cpp-527-fall-2021

A course shell for CPP 527 Foundations of Data Science II
https://watts-college.github.io/cpp-527-fall-2021/
2 stars 6 forks source link

step 3-final project #69

Open Jana-Ajeeb opened 3 years ago

Jana-Ajeeb commented 3 years ago

I tried to run this part in step 3 but it's giving me the below error:

d$gender[ is.na(d$gender) ] <- "uncoded"
d$gender <- factor( d$gender, levels=c("male","female","uncoded") )

Error in $<-.data.frame(*tmp*, gender, value = character(0)) : replacement has 0 rows, data has 12520

lecy commented 3 years ago

You should have two data frames at this point.

To add gender back to the data frame you need to merge or join the datasets.

d  # salary
gen # gender codes 

d <- # merge or join d and gen

Until you do that the variable gender will not appear in data frame d.

d$gender[ is.na(d$gender) ] <- "uncoded"
d$gender <- factor( d$gender, levels=c("male","female","uncoded") )
Jana-Ajeeb commented 3 years ago

i tried:

d <- merge(d, gen)

but it's giving me this error:

Error: memory exhausted (limit reached?) Error during wrapup: memory exhausted (limit reached?) Error: no more error handlers available (recursive errors?); invoking 'abort' restart

lecy commented 3 years ago

Review notes on merges:

http://ds4ps.org/dp4ss-textbook/p-076-merging-data.html

Specifically you will need to be clear about which fields you are merging on, as well as whether you want data dropped during the merge.

Which of these is appropriate here?

image

image

image

image

voznyuky commented 3 years ago

I believe this is what we're supposed to do to merge the two data frames:

dat3 <- merge( x=d, y=gen, by.x = "first.name", by.y = "name", all = FALSE) 
head(dat3)%>% pander()

But I'm still confused how to incorporate this:

d$gender[ is.na(d$gender) ] <- "uncoded"
d$gender <- factor( d$gender, levels=c("male","female","uncoded") )

I get the same error message:
Error in `$<-.data.frame`(`*tmp*`, gender, value = character(0)) : replacement has 0 rows, data has 12520
lecy commented 3 years ago

You are recoding empty levels. After the merge the gender variable should be available in your dat3:

dat3$gender[ is.na(dat3$gender) ] <- "uncoded"
lecy commented 3 years ago

Why did you select this argument?

Make sure you do not drop observations from the salary database.

merge( ..., all = FALSE )
Jana-Ajeeb commented 3 years ago

ohh now I get we should use the 1st one

Thanks a lot!!