Apply Propagation of Error to determine a measure of error

Gastastrophe commented 2 years ago

Since the process involves multiplying by constant weights, adding, and dividing by constant weights, it should be possible to interpolate the error measurements directly to propagate the error.

Gastastrophe commented 2 years ago

Since we have no knowledge of the actual distribution of the population in a census block, the error in the point cloud distribution cannot be determined (as far as I know). We will therefore treat the point cloud as a series of constants.

Gastastrophe commented 2 years ago

Since the assignment of 2010 Census block data to points on our point cloud is the application of n (1 for each point) functions f_n(x) which take the population of the census block as an input and multiply by the weight corresponding to that point, we can apply the formula for propagating error through a constant function df_n = |w_n| * dx where w_n is the weight for the nth point and d is the error operator. More formally, we can consider f_n(x_1, ... , x_k) = <0, ... , w_n, ... 0> dot <x_1, ... , x_k)> where x_k is the kth census block and w_n occupies the position aligned with the block containing the nth point, allowing {f_n} to be a family of functions over the same variables with error functions df_n = |w_n| * dx_k. Since k is determined entirely by n for the purposes of this process, we can instead write df_n = |w_n| * dx_n.

Gastastrophe commented 2 years ago

When propagating error in variables of interest being passed to points on the point cloud, we first calculate the error from converting the variables to ratios with the population. Since this is simple division, the error of the function g(a,b) = a/b where a is a variable of interest and b is the population is dg = |g(a,b)| sqrt([da/a]^2 + [db/b]^2). Next, we find the error in applying these ratios to points, which is the simple function h(r,p) = r * p where r is the ratio derived from function g and p is the population at the point derived from function f. The error formula for h is then dh = |h(r,p)| sqrt([dr/r]^2 + [dp/p]^2).

Fixing a variable of interest (since this process is independent for each variable), we then get the updated error function for the variable at the nth point dh_n(x_n,a,b) = |a/b * w_n * x_n| sqrt([sqrt([da/a]^2 + [db/b]^2))]^2 + [dx_n/x_n]^2). As a note, when interpolating population data, there is no reason to transfer data from census tracts since we are already using population counts at the census tract block level, and hence the error for this variable remains as df.

Gastastrophe commented 2 years ago

For the last step, we sum these points to 2020 census tracts using a family of functions t_m(p_1, ... , p_n) = sum_{i=1}^n (delta_{m,i} p_i) where p_i is the value obtained from h_i, delta_{m,i} = 1 if p_i is in the mth tract and delta_{m,i} = 0 otherwise. This gives us our final family of error functions for interpolated measurements dt_m(a, b) = sqrt( sum_{i=1}^n delta_{m,i} |a/b * w_i * x_i|^2 [sqrt([da/a]^2 + [db/b]^2))]^2 + [dx_i/x_i]^2)

Gastastrophe commented 2 years ago

As a note, since margins of error are only published in ACS, ACS data must be used when propagating error. Consequently, since ACS does not report block level data, we treat the census block population as a constant and use the updated error function dt_m(a,b) = sum_{i=1}^n delta_{m,i} * w_i^2 * x_i^2 * |g(a,b)|^2 * ([da/a]^2 + [db/b]^2)

Gastastrophe commented 2 years ago

Closing since the propagated error is extremely high. The linked branch will remain open for future development.

Gastastrophe commented 1 year ago

Reopening as we attempt to find a different way to propagate error

Gastastrophe commented 1 year ago

There is a square root missing from the final error calculation, so the correct function is in fact dt_m(a,b) = sqrt[ sum_{i=1}^n delta_{m,i} * w_i^2 * x_i^2 * (a/b)^2 * ([da/a]^2 + [db/b]^2) ] This was done correctly in the implementation, but was documented incorrectly.

See the updated work in this picture

de-data-lab / census-tract-redistricting

Apply Propagation of Error to determine a measure of error #13