Open srisi opened 5 years ago
I think we're calling nameparser's capitalize method incorrectly - putting a pin in this with this comment to come back - see, e.g., Mccormack
above, which is correctly transcribed McCormack
in the spreadsheet/csv.
I don't think this stuff in name_parser.py does quite what we want it to:
69 name = HumanName(name_raw)
70 # If first and middle initials have periods but not spaces -> separate, e.g. "R.K. Teague"
71 if re.match('[a-zA-Z]\.[a-zA-Z]\.', name.first):
72 name.middle = name.first[2]
73 name.first = name.first[0]
74
75 name.last = name.last.capitalize()
76 name.first = name.first.strip('.').capitalize()
77 name.middle = name.middle.strip('.').capitalize()
Poked around a little, and I think the thing to do is to set all of the name fields and then run name.capitalize() on the whole thing to modify it in place, and then extract name.last, name.first, and name.middle. Will take a proper look tmrw.
@srisi
In [1]: from nameparser import HumanName
In [2]: name = HumanName('McCormack, E. L.')
In [3]: name.last
Out[3]: 'McCormack'
In [4]: name.last.capitalize()
Out[4]: 'Mccormack'
whereas...
In [15]: name = HumanName('Mccormack, E. L.')
In [16]: name.capitalize(force=True)
In [17]: name.last
Out[17]: 'McCormack'
... which does seem off, but there you go. I'll fix and make a PR.
[2019-05-13 -- this comment replaced with updated list below]
Computation Center phone directory records from 1955-56 will help address some of these; we'll have these visually tomorrow.
For some reason @erica02139 's latest edited list sent by email didn't get attached to this issue. Here it is, replacing my comment above:
[deleted -- old, replaced by Erica's list below]
Thanks, @ryaanahmed! Have done more; will post tonight: also, here are Computation Center directory records from 1956-1963: will post on Slack, too. https://drive.google.com/open?id=1-34DNXHFO6kgb2lXIzL9W-qouWJzN8RQ
Perfect -- I just went over and took lots of pictures of the area. Will make a story.
Last name:
Arden
First name variants:
Bruce W. [real: mez]
Unknown [2-14, p. 61 is likely “Arden, Dean N.”; 3-9, p. 3 and 3-10, p. 88 likely “Arden, Bruce W.” who was “Mr.” not “Dr.”]
Dean M. [chg to “Dean N.”: mez]
Dean A. [chg to “Dean N.”: text is “Dean”: mez]
Dean N. [real: mez]
Last name:
Brown
First name variants:
Sanborn C. [real: mez]
Gordon S. [real: mez]
Unknown [chg to “Brown, Gordon S.”: mez]
E. Cary [real: mez]
Last name:
Caldwell
First name variants:
Samuel H. [real: mez]
David O. [real: mez]
Last name:
Campbell
First name variants:
Elizabeth J. [real: mez]
Ashley S. [real on 2-14, p. 102; mis-entered for 2-14, p. 85 (chg to “unknown”): mez]
Unknown [chg to “Ashley S.”: mez]
Pamela [real: added details to 2-26, doc 2 “note” field: mez]
Last name:
Case
First name variants:
Harold [real: mez]
Leon W. [real: mez]
Last name:
Clark
First name variants:
George W. [real: mez]
M. [real: “M.” is for Melville: mez]
Last name:
Coleman
First name variants:
Courtney [real:mez]
Albert F.
Last name:
Davis
First name variants:
Philip J. [real:mez]
David M.
Sam H.
Last name:
Floe
First name variants:
Unknown [likely “Carl F.”; referred to by last name in document also referencing “Corby”: mez]
Carl F. [real:mez]
Last name:
Green
First name variants:
Alan I.
W. D. [real; name is “Green, William D.”: mez]
J. W.
Last name:
Hansen
First name variants:
K. E. [real: mez]
R. J.
Last name:
Harris
First name variants:
Rufus
Louis
Last name:
Helwig
First name variants:
Frank C.
Diana B.
Last name:
Herring
First name variants:
Pendleton
Unknown
Last name:
Hill
First name variants:
Richard H.
Albert G.
Marjorie
Laura
Al
Last name:
Howard
First name variants:
R.
J.
Last name:
Hunter
First name variants:
G. Truman
Truman
P. L.
Last name:
Johnson
First name variants:
Howard W.
Eldon L.
Anthony
E. C.
Last name:
Jones
First name variants:
Dorothy P.
Fletcher
Robert E.
Unknown [IA 2019-05-09 no indication]
Last name:
Killian
First name variants:
James R.
T. J.
Last name:
Little
First name variants:
John D. C
J. A.
Last name:
Mann
First name variants:
Leonard A.
Edward S.
Last name:
Mason
First name variants:
E. A.
R. D.
Last name:
McCormack
First name variants:
James
E. L.
Last name:
Miller
First name variants:
Unknown
C. L.
S.
Last name:
Morris
First name variants:
J. C.
G. J.
Last name:
Nelson
First name variants:
Clifford V.
Robert A.
Last name:
Peterson
First name variants:
Carl M. F
L.
Last name:
Price
First name variants:
Daniel O.
B. G.
Last name:
Robertson
First name variants:
Harold
J. E.
Last name:
Shader
First name variants:
Melvin A.
Mel [still unclear if this is Melvin, if we can find out if Melvin A. is also a Dr. then we have sufficient information]
Last name:
Slater
First name variants:
John C.
John M.
Last name:
Smith
First name variants:
Paul A.
E. H.
Last name:
Stoddard
First name variants:
R. E.
P. A.
Last name:
Stratton
First name variants:
Julian S. (changed to Julius A., president of MIT. Letter is addressed to MIT president).
Julius A.
Last name:
Thompson
First name variants:
Greg R.
T. J.
C. G.
Last name:
Tucker
First name variants:
Unknown
John A.
C. E.
Last name:
Unger
First name variants:
Ing H.
H.
Last name:
Unknown
First name variants:
Rosemary
Jewell
Elaine
Jane
Ray
Last name:
Verzuh
First name variants:
Edna Tamm
Frank M.
H. M. //actually F.M. fixed. sr
Last name:
Walker
First name variants:
Gordon L.
Eric
Last name:
Walsh
First name variants:
Fr Michael P
Joseph B.
Last name:
Wells
First name variants:
W. D.
W. H. [different people. //sr]
Last name:
Williams
First name variants:
Richard H.
Robert W.
I'm removing this from the deploy-ready milestone. At this point, it looks to me like we've cleaned most of the truly wrong metadata; there's still some sleuthing to do around many of the 'unknown' first-name folks, but this can become ongoing project maintenance work and work for the summer.
There are some duplicates left in our metadata sheet. Here's a list of likely candidates to look into. If you have either checked or fixed a last name, please add it here so I can update the list.