CenterForAssessment / randomNames

Function to generate random gender and ethnicity correct first and/or last names. Names are chosen proportionally based upon their probability of appearing in a large scale data base of real names.
https://centerforassessment.github.io/randomNames
Other
32 stars 6 forks source link

Handle NAs in randomNames()'s gender argument #56

Closed msenn closed 6 years ago

msenn commented 6 years ago

When calling randomNames(gender = g) on a gender vector that contains NAs, the generated names do no longer represent gender correctly:

# Define gender vector
> (g <- rep(0:1, each = 3))
[1] 0 0 0 1 1 1

# Gender is correctly represented
> randomNames::randomNames(gender = g, which.names = "first")
[1] "Samuel"   "Carlos"   "Theodore" "Emlynn"   "Briana"   "Deborah" 

# Include NA in gender vector
> g[3] <- NA
> randomNames::randomNames(gender = g, which.names = "first")
[1] "Maleeha" "Sean"    "Sad"     "Sang"    "Labeeba" "Carter" 

# First gender is 0 (male)
> g[1]
[1] 0

# "Maleeha" is not among any mal first names list
> fn_male <- grep("^first.*g0$", names(randomNames::randomNamesData))
> sapply(fn_male, function(i)  "Maleeha" %in% names(randomNames::randomNamesData[[
+     names(randomNames::randomNamesData)[i]
+     ]])
+ )
[1] FALSE FALSE FALSE FALSE FALSE FALSE
dbetebenner commented 6 years ago

Thanks for seeing that. I've fixed that up. Let me know if there's anything else. Version 1.3-0.0 is the new version. Available on GitHub now and will post to CRAN shortly and hopefully available there in a day or two.

msenn commented 6 years ago

Thanks, Damian, for looking into this. As far as I can tell, the non-NA genders are now populated correctly.

May I ask what behavior you've chosen for the cases where gender is NA? I can think of multiple options that seem sensible:

Possibly, the information on the chosen behavior could be included in the documentation of the function.

dbetebenner commented 6 years ago

Good idea (to include in documentation).

Current it just randomly samples a gender/ethnicity and replaces the NA with that. https://github.com/CenterForAssessment/randomNames/blob/master/R/randomNames.R#L61-L62