CreatingData / Historical-Populations

Historical US City populations
http://creatingdata.us/datasets/US-cities
43 stars 2 forks source link

Include establishment year for cities where in Wikipedia #3

Open bmschmidt opened 6 years ago

bmschmidt commented 6 years ago

Wikipedia frequently includes--either as structured text or in the first few paragraphs--the establishment date for a city. This would be useful for thematic mapping. If you make a map of Georgia in 1836, you would like to use some of the 1840 cities; but others may not be established until the end of the decade.

This is especially important when trying to compare city locations against other variables--e.g., railroads or Indian land cessions.

sergiocorreia commented 6 years ago

Note that there are a variety of labels for the founding date. I wrote an alternative python script that scraps Wikipedia (can share it if useful) and some of the labels that I found include:

'founded', 'incorporated', 'settled', 'established', 'platted', 'chartered', ...

bmschmidt commented 6 years ago

Yeah, I also noticed 'platted' and saw this would require a bit of text mining to build up a list.

I imagine this would have to be a hierarchy of regular expressions. A "founded" date is generally better than an "incorporated" one. But I don't know how to resolve some of them: 'settled' is vague (it also may include populations not counted by the census, like native settlements and pre-cession Mexican towns/missions, which are treated a little ambiguously by this repository.)

Would love to see your python script.