dhmit / computation_hist

Archival History of the MIT Computation Center
BSD 3-Clause "New" or "Revised" License
5 stars 17 forks source link

Fix Names in Metadata #351

Open srisi opened 5 years ago

srisi commented 5 years ago

There are some duplicates left in our metadata sheet. Here's a list of likely candidates to look into. If you have either checked or fixed a last name, please add it here so I can update the list.


Last name:
        Adams [DONE ra 2019-05-02] 
First name variants:
         W. C. [transcription error] 
         C. W. [standardized to this]

Last name: 
        Blackburn [DONE ra 2019-05-02]
First name variants:
         Jacob F. [standardized to this - Jack is a nickname throughout]
         Jack F.

Last name:
        Brown [DONE ra 2019-05-02]
First name variants:
         Gordon S.
         E. Cary [genuinely a different person!]
         Dean [this is G.S. Brown's title]
         C. S. [transcription error - actually G.S. for Gordon S.]

Last name:
        Cusick [DONE ra 2019-05-02]
First name variants:
         Unknown [both cases of this seem to be Paul V.]
         Paul V.

Last name:
        Fano [DONE ra 2019-05-02]
First name variants:
         M. R. [transcription error]
         R. M.

Last name:
        Floe [maybe done? ra 2019-05-02]
First name variants:
         Carl F.corbató [there's a missing semicolon after 'F.' here, which has conflated a couple of names into one entry... I've fixed this in the metadata, but this problem may have been masking other problems...]
         Unknown 

Last name:
        Glaser [DONE ra 2019-05-02]
First name variants:
         Erza [transcription error]
         Ezra 

Last name:
        Green [Done IA 2019-05-09] [ the names are different people ]
First name variants:
         J. W.
         W. D.

Last name:
        Herring
First name variants:
         Pendleton 
         Unknown [IA 2019-05-09 - I can't see how people identified the cc on documents written by Herring, Unknown] 

Last name:
        Hill [IA 2019-05-09 - no indication these are the same people Al and Albert could be different people]
First name variants:
         Albert G.
         Al

Last name:
        Hunter [Done IA 2019-05-09]
First name variants:
         G. Truman
         Truman 

Last name:
        Hurd [DONE ra 2019-05-02]
First name variants:
         Cuthbert C.
         Unknown [seems to be Cuthbert -- 2_3_morse_correspondence_m_z_77]

Last name:
        Johnson
First name variants:
         Eldon L.
         E. C.

Last name:
        Jones
First name variants:
         Dorothy P.
         Robert E.
         Unknown [IA 2019-05-09 no indication]

Last name:
        Little
First name variants:
         John D. c
         J. A.

Last name:
        Maxwell [DONE  ra 2019-05-02 - these are genuinely different people]
First name variants:
         Joseph R.
         I. R.

Last name:
        Mccarthy
First name variants:
         Unknown 
         John 

Last name:
        Mccormack
First name variants:
         James 
         E. L.

Last name:
        Morse [DONE ra 2019-05-02]
First name variants:
         Philip M.
         F. M. [foreign correspondent misspelling of Philip]

Last name:
        Mosteller [done ra 2019-05-09 - these are all C. Frederick Mosteller, Harvard statistician]
First name variants:
         Frederick 
         Unknown 
         C. F.

Last name:
        Murray [Done IA 2019-05-09 Different people]
First name variants:
         Jane F.
         J. M.

Last name:
        Mussard [DONE ra 2019-05-02]
First name variants:
         Jean M. [transcription error]
         Jean A. 

Last name:
        Panov [DONE ra 2019-05-02]
First name variants:
         Yu D. [transcription error]
         D. Yu

Last name:
        Pigford [DONE ra 2019-05-02]
First name variants:
         Thomas H.
         T. J. [index finger typo in the original! - clearly Thomas H as the same office]

Last name:
        Reissner [Done IA 2019-05-08]
First name variants:
         R. [R looks like E, you have too zoom right in to see]
         E. 

Last name:
        Shader
First name variants:
         Melvin A.
         Unknown [IA 2019-05-08 this is Melvin A.]
         Mel [still unclear if this is Melvin, if we can find out if Melvin A. is also a Dr. then we have sufficient information]

Last name:
        Steinberg
First name variants:
         Unknown 
         J. R.

Last name:
        Tucker
First name variants:
         Unknown 
         John A.
         C. E.

Last name:
        Unknown
First name variants:
         Rosemary 
         Elaine 
         Jewell 
         Jane 

Last name:
        Verzuh
First name variants:
         Edna Tamm
         Frank M.
         H. M.

Last name:
        Webber [done RA 2019-05-09]
First name variants:
         Unknown [pretty clearly Roger from the other folks on the doc]
         Roger P.
         D. S. r [this is Roger - D.S.R. is the abbreviation for his title]

Last name:
        Wells [done (?) ra 2019-05-09 - these seem to be different, but there isn't really enough info in the documents to tell... it may be that one of the docs (1_25_proposed_conference_28) to "Wells" is not to W.D. but actually to W.H., but I don't have enough info here to tell...]
First name variants:
         W. D.
         W. H.

Last name:
        Weyl [DONE ra 2019-05-02]
First name variants:
         F. Joachim
         Joachim F. [transcription error]
``
ryaanahmed commented 5 years ago

I think we're calling nameparser's capitalize method incorrectly - putting a pin in this with this comment to come back - see, e.g., Mccormack above, which is correctly transcribed McCormack in the spreadsheet/csv.

I don't think this stuff in name_parser.py does quite what we want it to:

 69         name = HumanName(name_raw)
 70         # If first and middle initials have periods but not spaces -> separate, e.g. "R.K. Teague"
 71         if re.match('[a-zA-Z]\.[a-zA-Z]\.', name.first):
 72             name.middle = name.first[2]
 73             name.first = name.first[0]
 74 
 75         name.last = name.last.capitalize()
 76         name.first = name.first.strip('.').capitalize()
 77         name.middle = name.middle.strip('.').capitalize()

Poked around a little, and I think the thing to do is to set all of the name fields and then run name.capitalize() on the whole thing to modify it in place, and then extract name.last, name.first, and name.middle. Will take a proper look tmrw.

ryaanahmed commented 5 years ago

@srisi

In [1]: from nameparser import HumanName                                                                                                                                                                    

In [2]: name = HumanName('McCormack, E. L.')                                                                                                                                                                

In [3]: name.last                                                                                                                                                                                           
Out[3]: 'McCormack'

In [4]: name.last.capitalize()                                                                                                                                                                              
Out[4]: 'Mccormack'

whereas...

In [15]: name = HumanName('Mccormack, E. L.')                                                                                                                                                               

In [16]: name.capitalize(force=True)                                                                                                                                                                        

In [17]: name.last                                                                                                                                                                                          
Out[17]: 'McCormack'

... which does seem off, but there you go. I'll fix and make a PR.

ryaanahmed commented 5 years ago

[2019-05-13 -- this comment replaced with updated list below]

erica02139 commented 5 years ago

Computation Center phone directory records from 1955-56 will help address some of these; we'll have these visually tomorrow.

ryaanahmed commented 5 years ago

For some reason @erica02139 's latest edited list sent by email didn't get attached to this issue. Here it is, replacing my comment above:

[deleted -- old, replaced by Erica's list below]
erica02139 commented 5 years ago

Thanks, @ryaanahmed! Have done more; will post tonight: also, here are Computation Center directory records from 1956-1963: will post on Slack, too. https://drive.google.com/open?id=1-34DNXHFO6kgb2lXIzL9W-qouWJzN8RQ

mscuthbert commented 5 years ago

Perfect -- I just went over and took lots of pictures of the area. Will make a story.

erica02139 commented 5 years ago
Last name:
    Arden
First name variants:
    Bruce W. [real: mez]
    Unknown [2-14, p. 61 is likely “Arden, Dean N.”; 3-9, p. 3 and 3-10, p. 88 likely “Arden, Bruce W.” who was “Mr.” not “Dr.”]
    Dean M. [chg to “Dean N.”: mez]
    Dean A. [chg to “Dean N.”: text is “Dean”: mez]
    Dean N. [real: mez]

Last name:
    Brown
First name variants:
    Sanborn C. [real: mez]
    Gordon S. [real: mez]
    Unknown [chg to “Brown, Gordon S.”: mez]
    E. Cary [real: mez]

Last name:
    Caldwell
First name variants:
    Samuel H. [real: mez]
    David O. [real: mez]

Last name:
    Campbell
First name variants:
    Elizabeth J. [real: mez]
    Ashley S. [real on 2-14, p. 102; mis-entered for 2-14, p. 85 (chg to “unknown”): mez]
    Unknown [chg to “Ashley S.”: mez]
    Pamela [real: added details to 2-26, doc 2 “note” field: mez]

Last name:
    Case
First name variants:
     Harold [real: mez]
     Leon W. [real: mez]

Last name:
    Clark
First name variants:
     George W. [real: mez]
     M. [real: “M.” is for Melville: mez]

Last name:
    Coleman
First name variants:
     Courtney [real:mez]
     Albert F.

Last name:
    Davis
First name variants:
     Philip J. [real:mez]
     David M.
     Sam H.

Last name:
    Floe
First name variants:
     Unknown [likely “Carl F.”; referred to by last name in document also referencing “Corby”: mez]
     Carl F. [real:mez]

Last name:
    Green
First name variants:
     Alan I.
     W. D. [real; name is “Green, William D.”: mez]
     J. W.

Last name:
    Hansen
First name variants:
     K. E. [real: mez]
     R. J.

Last name:
    Harris
First name variants:
     Rufus 
     Louis 

Last name:
    Helwig
First name variants:
     Frank C.
     Diana B.

Last name:
    Herring
First name variants:
     Pendleton 
     Unknown 

Last name:
    Hill
First name variants:
     Richard H.
     Albert G.
     Marjorie 
     Laura 
     Al 

Last name:
    Howard
First name variants:
     R. 
     J. 

Last name:
    Hunter
First name variants:
     G. Truman
     Truman 
     P. L.

Last name:
    Johnson
First name variants:
     Howard W.
     Eldon L.
     Anthony 
     E. C.

Last name:
    Jones
First name variants:
     Dorothy P.
     Fletcher 
     Robert E.
     Unknown [IA 2019-05-09 no indication]

Last name:
    Killian
First name variants:
     James R.
     T. J.

Last name:
    Little
First name variants:
     John D. C
     J. A.

Last name:
    Mann
First name variants:
     Leonard A.
     Edward S.

Last name:
    Mason
First name variants:
     E. A.
     R. D.

Last name:
    McCormack
First name variants:
     James 
     E. L.

Last name:
    Miller
First name variants:
     Unknown 
     C. L.
     S. 

Last name:
    Morris
First name variants:
     J. C.
     G. J.

Last name:
    Nelson
First name variants:
     Clifford V.
     Robert A.

Last name:
    Peterson
First name variants:
     Carl M. F
     L. 

Last name:
    Price
First name variants:
     Daniel O.
     B. G.

Last name:
    Robertson
First name variants:
     Harold 
     J. E.

Last name:
    Shader
First name variants:
     Melvin A.
     Mel [still unclear if this is Melvin, if we can find out if Melvin A. is also a Dr. then we have sufficient information]

Last name:
    Slater
First name variants:
     John C.
     John M.

Last name:
    Smith
First name variants:
     Paul A.
     E. H.

Last name:
    Stoddard
First name variants:
     R. E.
     P. A.

Last name:
    Stratton
First name variants:
     Julian S. (changed to Julius A., president of MIT. Letter is addressed to MIT president).
     Julius A.

Last name:
    Thompson
First name variants:
     Greg R.
     T. J.
     C. G.

Last name:
    Tucker
First name variants:
     Unknown 
     John A.
     C. E.

Last name:
    Unger
First name variants:
     Ing H.
     H. 

Last name:
    Unknown
First name variants:
     Rosemary 
     Jewell 
     Elaine 
     Jane 
     Ray 

Last name:
    Verzuh
First name variants:
     Edna Tamm
     Frank M.
     H. M.   //actually F.M. fixed. sr

Last name:
    Walker
First name variants:
     Gordon L.
     Eric 

Last name:
    Walsh
First name variants:
     Fr Michael P
     Joseph B.

Last name:
    Wells
First name variants:
     W. D.
     W. H. [different people. //sr]

Last name:
    Williams
First name variants:
     Richard H.
     Robert W.
ryaanahmed commented 5 years ago

I'm removing this from the deploy-ready milestone. At this point, it looks to me like we've cleaned most of the truly wrong metadata; there's still some sleuthing to do around many of the 'unknown' first-name folks, but this can become ongoing project maintenance work and work for the summer.