Open Jana-Ajeeb opened 3 years ago
You should have two data frames at this point.
To add gender back to the data frame you need to merge or join the datasets.
d # salary
gen # gender codes
d <- # merge or join d and gen
Until you do that the variable gender will not appear in data frame d.
d$gender[ is.na(d$gender) ] <- "uncoded"
d$gender <- factor( d$gender, levels=c("male","female","uncoded") )
i tried:
d <- merge(d, gen)
but it's giving me this error:
Error: memory exhausted (limit reached?) Error during wrapup: memory exhausted (limit reached?) Error: no more error handlers available (recursive errors?); invoking 'abort' restart
Review notes on merges:
http://ds4ps.org/dp4ss-textbook/p-076-merging-data.html
Specifically you will need to be clear about which fields you are merging on, as well as whether you want data dropped during the merge.
Which of these is appropriate here?
I believe this is what we're supposed to do to merge the two data frames:
dat3 <- merge( x=d, y=gen, by.x = "first.name", by.y = "name", all = FALSE)
head(dat3)%>% pander()
But I'm still confused how to incorporate this:
d$gender[ is.na(d$gender) ] <- "uncoded"
d$gender <- factor( d$gender, levels=c("male","female","uncoded") )
I get the same error message:
Error in `$<-.data.frame`(`*tmp*`, gender, value = character(0)) : replacement has 0 rows, data has 12520
You are recoding empty levels. After the merge the gender variable should be available in your dat3:
dat3$gender[ is.na(dat3$gender) ] <- "uncoded"
Why did you select this argument?
Make sure you do not drop observations from the salary database.
merge( ..., all = FALSE )
ohh now I get we should use the 1st one
Thanks a lot!!
I tried to run this part in step 3 but it's giving me the below error:
Error in
$<-.data.frame
(*tmp*
, gender, value = character(0)) : replacement has 0 rows, data has 12520