datamade / probablepeople

:family: a python library for parsing unstructured western names into name components.
http://parserator.datamade.us/probablepeople
MIT License
593 stars 71 forks source link

undocumented labels #43

Open az0 opened 7 years ago

az0 commented 7 years ago

This page documents 19 labels https://probablepeople.readthedocs.io/en/latest/

However, it is missing these labels CorporationCommitteeType CorporationNameAndCompany CorporationNameBranchIdentifier CorporationNameBranchType OtherCorporationName SecondCorporationCommitteeType SecondCorporationName SecondCorporationNameAndCompany SecondCorporationNameBranchIdentifier SecondFirstInitial SecondGivenName SecondMiddleInitial SecondMiddleName SecondPrefixMarital SecondSuffixOther SecondSurname

jeancochrane commented 7 years ago

Good catch! It's true that we're missing some of these. However, since some of these are compound names (e.g. SecondFirstInitial is the same type of thing as FirstInitial, only difference being that parserator found two of them in one string and needs to differentiate them) I'm not sure how useful it would be to have them in the docs. Might be best to keep it up to date with the options that are available for console labeling.

az0 commented 7 years ago

I needed an exhaustive list of field names for the CSV DictWriter. It would crash for each undetected field name, but when I had a full list, I could also order them in a logical way (basically: person 1, person 2, company).

Other people's needs vary. It's up to you.

jeancochrane commented 7 years ago

Makes sense to me. Perhaps two lists would be most helpful (tokens available for labeling + full set of tokens under the hood) with an explanation of why they're separate.