SimonMolinsky / pyinterpolate-paper

Paper materials for pyinterpolate package
0 stars 1 forks source link

Poisson Kriging System #1

Open Mujingrui opened 3 years ago

Mujingrui commented 3 years ago

Hi @szymon-datalions

Thank you for your excellent work. I am a little confused about the Poisson Kriging System in your code. image There is an error term m/n(vi). m: population-weighted mean of the N rates and n(vi): the size of population at risk. I am not sure how you program it in your code. Thank you for your time and any possible responses.

SimonMolinsky commented 3 years ago

Hi @Mujingrui ,

This term is present in parts related to the

and in each case it is method calculate_weight_arr.

Method is written as follows:

def calculate_weight_arr(self):
        vals_of_neigh_areas = self.prepared_data[:, 1]
        pop_of_neigh_areas = self.prepared_data[:, -1]

        # Weighted mean (m*)
        weighted = np.sum(vals_of_neigh_areas * pop_of_neigh_areas)
        weights_arr = weighted / np.sum(pop_of_neigh_areas)

        # Error term
        w = np.ones(shape=vals_of_neigh_areas.shape)
        w = (weights_arr * w) / pop_of_neigh_areas

        # Diagonal array
        return np.diag(w)

First, I calculate population-weighted mean for neighbors then the error term (division by population per area) and finally function returns array of zeros with weights placed on the diagonal. This is related to the $ \delta $ term in the equation (1 if area is compared to itself and 0 otherwise). I hope it's clear.

Mujingrui commented 3 years ago

Hi, @szymon-datalions Thank you for your clear explanation. I just not found the calculation for population at risk n(vi) = 100,000 * count of mortality cases/rate. Thank you for your time and any possible responses.

SimonMolinsky commented 3 years ago

Hi @Mujingrui ,

Code itself doesn't calculate incidence rates (100.000 * cases/total population) and you must prepare data in this format. Population per unit is calculated from the given shapefiles with population centroids / blocks. But small function for cases -> rates conversion could be useful, so I'll add it into the package.

Mujingrui commented 3 years ago

Hi @szymon-datalions,

Thank you for your kindly reply. I am just wondering that such kind of dataset is not easily prepared sometimes, because it is difficult to get the cases data and rate data at each population centroid. Mostly the number of cases and rates will be given at each region, not at each population centroid.

Thanks for your reply.

SimonMolinsky commented 3 years ago

Hi @Mujingrui ,

Do you mean that data preparation (or more precisely data mining) step may be an obstacle which prevents users from using pyinterpolate? If so, this is a very important piece of information and I'll take it into account in the future. Thank you for your time!

Mujingrui commented 3 years ago

Hi, @szymon-datalions

Thank you for your reply. Yes, this is my point. Since sometimes it's difficult to collect the number of cases at each population centroid. Public Health Department always gives us the number of cases in each area.

Cheers.

SimonMolinsky commented 3 years ago

I'll let this issue open for some time because I'm working with the new project related to the greater availability of those data sources directly from the Python.