CreatingData / Historical-Populations

Historical US City populations
http://creatingdata.us/datasets/US-cities
43 stars 2 forks source link

Interpolate missing censuses #2

Open bmschmidt opened 6 years ago

bmschmidt commented 6 years ago

I notice that small-town Kentucky has no estimates for the year 1820, but a large number of cities (eg Bardstown do have estimates for 1810 and 1830.

There are certainly a number of other cases like this (though I don't think any others that cover most of a state).

The core data series should represent missing values. My inclination is to do with the number '-1' so numeric computation is still easy. The overall estimates would be better if they interpolated missing years to the geometric mean of the outside years.

sergiocorreia commented 6 years ago

A related issue happens with the 1880 census. It reports a lot of unincorporated places (villages, etc) that were not reported in 1870 or 1890 (unless the towns were fairly big).

The question then is how to extrapolate these numbers. Extrapolating backwards is particularly tricky, as we don't want to extrapolate e.g negative population. An option would be to use the growth rate of the township where the town exists, but that of course carries its own set of assumptions.

bmschmidt commented 6 years ago

Confirmed. In some cases this might be connected to the other issue I filed today; a foundation date might be better than nothing for back-extrapolation, possibly in concert with township or county data.

In the short term, I think interpolation is an easier call than extrapolation.

Towns vanishing in 1890:

only 1890

Towns vanishing in 1900 are largely confined to TX, TN, and CA:

1890 not 1900

Code used to generate (for my own reference)

map.plotAPI({'year': 1900, "filters.Cities": "vanishes", 'yearOffset':10, 'scales.size.Cities': 'd => 4', 'drawing': ['Cities']})