Closed end0 closed 10 years ago
Playing with it a bit. It seems that the below set, added to TITLES seems to solve the common cases, and even some of the rarer one.
['senior', 'magistrate', 'judge', 'mag', 'judge', 'magistrate-judge', 'mag-judge''honorable', 'hon', 'designated']
I'm happy to donate the full names dataset to you for testing purposes. Just let me know.
Wow, I'm surprised that some of those titles aren't already in there. Thanks very much for passing those along. I'll add them to the project's constant and reopen this issue to track it.
I would gladly accept your real world dataset to test against. I haven't gotten to test it against much real world data.
I'm surprised by the last two you mention. "Hon." should be correctly parsed. e.g:
Sprout:python-nameparser derek$ ./tests.py "Hon. Charles J. Siragusa"
<HumanName : [
Title: 'Hon.'
First: 'Charles'
Middle: 'J.'
Last: 'Siragusa'
Suffix: ''
Nickname: ''
]>
Are you using version 0.2.9? (the latest)
If you happen to be dealing with a number of first names and titles like "Sir Gerald" there's some changes in master that might be helpful. See #7
I have a dataset ready for you (though there are some non-conforming names). I'm not sure how to upload directly to the issue tracker, so let me know where to send or how to upload directly. Alternatively, I can paste here, but don't want to pollute the issue tracker.
[EDIT] Just realized I can create a gist which should be just as good. https://gist.github.com/end0/daa8378d06642b69db77 [/EDIT]
I've been playing with it a bit more, and look like this set takes care of most of the issues, though I worry about over-specifying for this dataset (especially the first half of the set).
['us', 'sr judge', 'special', 'senior-judge', 'pslc', 'pro se', 'law clerk', 'docket', 'mag/judge', 'federal', 'edmi', 'discovery', 'senior', 'magistrate', 'judge', 'mag', 'judge', 'magistrate-judge', 'mag-judge', 'honorable', 'hon', 'designated', 'district']
Just FYI, these are the names / titles of federal US judges. It may not encompass state or other courts properly, but hopefully a good start.
Thanks a bunch. That's really helpful. I'll get those added to the project.
Titles can be chained, so adding "sr" and "judge" should also take care of "sr judge". The same is not true of sr-judge, though I'm wondering if that might not be a bad idea (count "-" or "/" as a space to allow joining when appearing in titles).
The only reason that a potential title should be omitted from the titles constant is if it could also be a first name. Any strings in the titles constant will never be considered a first name. Other than that, I don't see any reason to not include every possible title.
I think I got all these added, as well as a few that some quick googling turned up. The data you provide made me notice a few other things we might be able to handle better too. Thanks for providing it.
Hey -
First off, awesome package. I've been working with a dataset of ~3000 judges and associated titles, and noticed nameparser doesn't pick most (well, any) of them up. Below is the filtered list with at least a few examples/variations on each. I'm happy to do the changes if you'd like. Let me know.
common
Magistrate Judge John F. Forster, Jr Magistrate Judge Joaquin V.E. Manibusan, Jr Magistrate-Judge Elizabeth Todd Campbell Mag-Judge Harwell G Davis, III Mag. Judge Byron G. Cudmore Chief Judge J. Leon Holmes Chief Judge Sharon Lovelace Blackburn Judge James M. Moody Judge G. Thomas Eisele Judge Callie V. S. Granade Judge C Lynwood Smith, Jr Senior Judge Charles R. Butler, Jr Senior Judge Harold D. Vietor Senior Judge Virgil Pittman
Honorable Terry F. Moorer Honorable W. Harold Albritton, III Honorable Judge W. Harold Albritton, III Honorable Judge Terry F. Moorer Honorable Judge Susan Russ Walker Hon. Marian W. Payson Hon. Charles J. Siragusa
rare
US Magistrate Judge T Michael Putnam Designated Judge David A. Ezra Sr US District Judge Richard G Kopf