Closed SPTKL closed 4 years ago
Also note that for many records, we have P > 100, if we use the current defined way to identify the total, e.g. CNI1864_1
for SI28
>>> lst = ['1400000US36085020700', '1400000US36085021300', '1400000US36085024700', '1400000US36085025100']
>>> df.loc[df.GEO_ID.isin(lst), ['DP02_0074E', 'DP02_0001E']]
938 3521.0 1787.0
940 1917.0 1012.0
941 4294.0 2133.0
954 3192.0 1566.0
it's clear that for each of these tracts, DP02_0074E > DP02_0001E
, so do we have a special calculation rule here that we need to apply?
anything labeled a base in the data dictionary will be set to 100 in the percent column.
From this query we can see that for some variables we have fairly different results for the number of records with 100 as percentage estimate.
Need to figure out if we need to set P=100 for certain variables by definition, if so, how to identify those variables