JuliaText / NameToGender.jl

Guess gender based on first name
Other
2 stars 0 forks source link

NameToGender

Stable Latest Build Status Build Status CodeCov Coveralls

Note: This is a generally terrible idea and should generally be avoided.

Since there is not direct mapping, and even names we have as Male or Female are still by no means fully certain. And while there are names as Androgynous this does not reflect anywhere near the reality of nonbinary genders. If designing data collection for demographic purposes, and you want a gender field, include one (and be sure to also allow a free-text option).

You definately should not use this package to make inferences about individual names. With that said, for some basic statistics of large populations, particularly for with-in countries that we have data for (using the 2 arg form), its probably not completely misleading. For example (and the reason this was created), when looking at a dataset of all papers published for a field, then this could be used to judge statistics like the portion of papers that have all male authors, vs all female authors. Even in such large scale population statistics, this should still not be used as actual data for purposes of study or policy.

Usage

NameToGender exports 1 function with two methods: classify_gender(name) and classify_gender(name, country). The latter is country sensitive. See the doc strings for more information on that.

classify_gender returns a value from the GenderUsage Enum:

@enum GenderUsage Male=-2 MostlyMale=-1 Androgynous=0 MostlyFemale=1 Female=2

You can use that directly e.g.

julia> classify_gender("Billie")
MostlyFemale::NameToGender.GenderUsage = 1

julia> classify_gender("Ada")
Female::NameToGender.GenderUsage = 2

or via comparason (though that does man remembering the Enum's order)

julia> people = ["Billie", "Ada", "Tom", "Jon", "Sally"]
5-element Array{String,1}:
 "Billie"
 "Ada"
 "Tom"
 "Jon"
 "Sally"

julia> prob_ladies = people[classify_gender.(people) .>= MostlyFemale ]
3-element Array{String,1}:
 "Billie"
 "Ada"
 "Sally"

If a name is not found in the database of names then missing is returned.

julia> using Missings # Required in julia 0.6 for ismissing etc.

julia> classify_gender("Linden")
missing

julia> ismissing.(classify_gender.(["Linden", "Lyndon"]))
2-element BitArray{1}:
  true
  false

License and Origin

This code is liscenced GPLv3+. See license.txt. This code is based on

Note: The data file nam_dict.txt is released under the GNU Free Documentation License.