berkmancenter / namae

Namae (名前) parses personal names and splits them into their component parts.
160 stars 32 forks source link

incorrectly parsed name: "laxmi zasdfasdf" #5

Closed jasdeepgosal closed 9 years ago

jasdeepgosal commented 10 years ago

I ran across this name (though this is an anonymized version, obviously) that isn't parsing correctly:

irb(main):006:0> Namae.parse("laxmi zasdfasdf").first
=> #<Name family="zasdfasdf" particle="Laxmi">

I would've expected:

irb(main):006:0> Namae.parse("laxmi zasdfasdf").first
=> #<Name family="zasdfasdf" given="laxmi">
inukshuk commented 10 years ago

The parser takes lower- and upper case into account; this mode would detect something like 'van Beethoven'. Is it legitimate, in your case, that both names are lower case?

jasdeepgosal commented 10 years ago

It is legitimate that both names are lower case.

inukshuk commented 10 years ago

Hmm, I'll make a note of this, but off the top of my head, I have no good idea how to support all lower case names without having to resort to pattern matching for particles. Do you have any suggestions?

If you know for a fact that there are no particles in your data-set you could patch the parser to turn particle into given after the parsing, perhaps?