Open YangLiu928 opened 8 years ago
Probably also need better ways to parse the original HTML to get the attributes (names, congressional district, etc.). The robustness of the current parsing method is questionable. should rely more on regular expression than on position of text (e.g. currently the state name is pulled off as the last two letters from the
need to extract from the current name format (say, call it "display name") the first name, last name, middle initial, nickname(?) and prefix/suffix(?). additionally, some names contains special characters (French/Spanish) that need special attention.