Closed mhucka closed 4 months ago
A partial fix is now in the dev branch and will be in the upcoming 1.3.0 release. The new implementation of is_person()
is not very accurate when it comes to names in CJK scripts, but it is still better than the current situation (which is that it always returns False for CJK names).
Solving this problem properly turns out to be very difficult. I wish I could do something better than the current weak, home-grown heuristics. Unfortunately, this appears to be a research-grade problem that no one has solved. Even the best AI systems today can't reliable tell you if, say, a given 1-3 character sequence in Chinese is the name of a person.
The current solution may be as good as we can get for now. I'm going to close this issue because it is unlikely that I can devote more time on this matter.
is_person()
inname_utils.py
will returnFalse
if a name string contains all-CJK characters. At the time I wrote it, it was done this way because the name checkers like ProbablePeople can't handle CJK. However, it's obviously wrong if the string really is a human name.