YangLiu928 / NDP_Projects

This is the central repository for projects at NDP analytics
MIT License
0 stars 1 forks source link

Need further name parsing for committee assignment web scraping #4

Open YangLiu928 opened 8 years ago

YangLiu928 commented 8 years ago

need to extract from the current name format (say, call it "display name") the first name, last name, middle initial, nickname(?) and prefix/suffix(?). additionally, some names contains special characters (French/Spanish) that need special attention.

YangLiu928 commented 8 years ago

Probably also need better ways to parse the original HTML to get the attributes (names, congressional district, etc.). The robustness of the current parsing method is questionable. should rely more on regular expression than on position of text (e.g. currently the state name is pulled off as the last two letters from the tag content)