CenterForAssessment / randomNames

Function to generate random gender and ethnicity correct first and/or last names. Names are chosen proportionally based upon their probability of appearing in a large scale data base of real names.
https://centerforassessment.github.io/randomNames
Other
32 stars 6 forks source link

Please add argument `initial.letter =` #74

Closed aito123 closed 2 years ago

aito123 commented 2 years ago

Hello i really like this package, it automated a lot of my work. Now it would be perfect if i could get to choose random names that start with a specific letter in a argument inside the function for example:

randomNames(n=5, gender=1, ethnicity = 4, which.names="first", initial.letter="M") # [1] "Maria" "Magdalena" "Margarita" "Margot" "Milagros"

Thank you and looking forward for an answer. Greetings from Peru

dbetebenner commented 2 years ago

Thanks for the suggestion. This could get complicated. Given that there isn't an endless supply of names in the internal dataset, requests for certain initial.letter, gender, ethnicity combinations would likely yield no names or repeats.

Consider the following wrapper function:

firstLetter <- function(tmp_data, first_letter, count) {
    sample(tmp_data[first_letter==substr(tmp_data, start=1, stop=1)], count, replace=TRUE)
}

firstLetter(randomNames(n=5000, gender=1, ethnicity = 4, which.names="first"), first_letter="Q", count=5)

> firstLetter(randomNames(n=5000, gender=1, ethnicity = 4, which.names="first"), first_letter="Q", count=5)

[1] "Quanisha"  "Quetzally" "Quetzally" "Quetzally" "Quanisha"

There are (it seems) only about 3 unique gender=1 ethnicity=4 names that start with a Q.

I have to ask for 5000 names so that I get enough names so that there is actually one containing Q.

Is this what you're (roughly) looking to do?

aito123 commented 2 years ago

Amazing, thank you very much. This is what i was looking for (roughly) speaking. Though it would be amazing a parameter for a initial.letter= , i understand that maybe the database does not have the same amount of names per letter. Anyway, greatful for your answer. Greetings from Perú.