SimonMolinsky / pyinterpolate-paper

Paper materials for pyinterpolate package
0 stars 1 forks source link

Kriged Population at Risk Map #5

Open Mujingrui opened 3 years ago

Mujingrui commented 3 years ago

Hi, @szymon-datalions

Thank you for reading my message. I am just a little confused that it looks like that you did Figure5: Breast cancer population at risk map in Northeastern United States and then transformed it to get Figure 6: Area-to-Point kriging breast cancer incidence rate map. May I ask that what population size you used to calculate the rate?

SimonMolinsky commented 3 years ago

Hi @Mujingrui,

I'll describe this process as soon as I can. May it be in the next week if you are not in a hurry?

Thanks,

Mujingrui commented 3 years ago

@szymon-datalions

Thank you so much:)

Mujingrui commented 3 years ago

Hi, @szymon-datalions

Maybe deviding the estimated value at each point by its population and then multiplying by 100000

SimonMolinsky commented 3 years ago

Thank you for reading my message. I am just a little confused that it looks like that you did Figure5: Breast cancer population at risk map in Northeastern United States and then transformed it to get Figure 6: Area-to-Point kriging breast cancer incidence rate map. May I ask that what population size you used to calculate the rate?

Hi @Mujingrui , could you point me to the files where those figures are placed? Paper has changed and I am not able to find those examples.

However if you mean the example HERE then the processed rates are from the beginning of calculations (rate as the [people with cancer / all people in the area] * 100,000). Then algorithm normalizes values at each point with the division by the total population of a given area. Does it answer your question?

Mujingrui commented 3 years ago

Hi, @szymon-datalions

Thank you for your reply:) Figure5 is here. I am just thinking Fiugure4 gives the population at risk map. And then do a transformation to get Breast Rate Map in Figure5.

I tried to divide the estimated value at each point by the population at each point in cancer_population_base to get the rate.

Many thx:)

SimonMolinsky commented 3 years ago

Hi @Mujingrui ,

It's always the rate but it is weighted across all points (population blocks) in a given county. If you add all rates (points) within a concrete county then you should obtain an incidence rate (and not the population-at-risk).

Just came to my mind that the estimated value of Area-to-Area Poisson Kriging should be the same* as aggregated values from the Area-to-Point Poisson Kriging. I think that I should use different example than only the disease rate vs population... In the public health nomenclature rate is usually weighted by some specific constant (10,000 or 100,000) and it introduces more confusion into the explanation.

*or very close due to the floating point precision

Mujingrui commented 3 years ago

Hi, @szymon-datalions

Thank you for your so nice reply:) So the area-to-point Poisson kriging give the estimated rate value at each point, not the population at risk.
I am just trying to understand the example more clearly and then to run the algorithm by myself with the data. The smoothed output shows the highest value of estimated value is 158.78, which is different from the highest value of Breast incidence rate 200.7. It confused me. Because the same problem also happened to my data, the estimated point value is much lower. And then I spatially joined the point smoothed output with the hexes population polygons shapefile, divided the estimated value by the population/100,000 at each point.

Many thx!!!!

Mujingrui commented 3 years ago

Hi, @szymon-datalions

Thank you for your clear reply! I figured out that the estimated value at each point should be n(us)/n(v_alpha)*hat(r)(us). So hat(r)(us) = estimated value / (n(us)/n(v_alpha)). I am not sure what I am understanding is right or not.

Many thx!

SimonMolinsky commented 3 years ago

Hi, @szymon-datalions

Thank you for your so nice reply:) So the area-to-point Poisson kriging give the estimated rate value at each point, not the population at risk.

Not exactly :) ATP gives us population at risk because it weights county rate over the county population. We don't know if the population inside the quadrant with the highest population density is the most infected one within the area. But we know that high population density and high infection rates needs more interest in the decision-making process and the choropleth map could be misleading because it shows whole area as a "red zone" or a "green zone" but only parts of this area are populated.

I am just trying to understand the example more clearly and then to run the algorithm by myself with the data. The smoothed output shows the highest value of estimated value is 158.78, which is different from the highest value of Breast incidence rate 200.7. It confused me. Because the same problem also happened to my data, the estimated point value is much lower.

Ok, this is not a problem with the data or algorithm, this is how semivariogram regularization works in this context; we are changing input semivariogram so it is not possible to make this work like the Ordinary Kriging. We don't get the exact values at the known locations. ATA and ATP PK generate smoothed maps with removed outliers. This could be dangerous in the applications where those outliers (or hot spots) should be tracked but this is a different issue. This is my opinion (because I'm not an expert) but Kriging is not designed to tackle anomalies directly but rather to smooth and filter data, and possibly access uncertainty of interpolation. If you are looking for the anomalies I'd rather move into time-series analysis instead of the spatial interpolation.

SimonMolinsky commented 3 years ago

I figured out that the estimated value at each point should be n(us)/n(v_alpha)*hat(r)(us). So hat(r)(us) = estimated value / (n(us)/n(v_alpha)). I am not sure what I am understanding is right or not.

It is right but with the accentuation that estimated value != (real) exact value of the areal aggregate even if the latter is known :)

Mujingrui commented 3 years ago

Hi, @szymon-datalions

Many thx!!

I also found the problem with uncertainty of interpolation. And I am trying to use Bayesian Spatial-temporal Analysis to tackle it. Actually why I want to use ATP Poisson kriging is that the algorithm can do a smoothed map and it can show how the epidemic spreads around the study area. In public health, just using ordinary kriging is not so reasonable because the type of the dataset is not like the dataset in mining application, which can provide many data points with values.

It is right but with the accentuation that estimated value != (real) exact value of the areal aggregate even if the latter is known :) Thx!! It is equal to the estimated aggregated areal rate from ATA Poisson Kriging.

Many thx again!!! Thank you for your work!!! I really appreciate it:) I am always trying to code the algorithm in R, but the running process is so long with errors.

Best,

SimonMolinsky commented 3 years ago

I don't know with which type of disease you're working on but I've worked with tick-borne diseases and their spread in the Central Europe. And PK maps are intermediate step in the pipeline: their first role is to filter a noise and then get population-at-risk. With it we put those data into the pipeline with the Species Distribution Model of the vector and weight each "population-at-risk unit" by its neighboring blocks of the SDM output (which was a probability map). Then the real "population-at-risk" used for the decision making was created. But it depends on the disease. As I saw in the literature the cancer itself is rather a stationary phenomena and prevalence may be linked directly to the structure of population (but to the "hidden" variables related to the population not the population density). COVID or flu are tricky ones but we have also created some data with PK ATP - to put weights into specific city districts, where infections may spread most rapidly.

But what I see during all those studies that rarely one technique is good enough from the public health perspective... Now we are experimenting with spatially and temporally stamped graph networks, especially in the case of rapidly evolving climate where old vector data (before 2000s) doesn't provide a lot of insights into the current problems.

Anyway, I'm glad that I hear so much from you - I'm listening and I'm open to the discussion of the future changes within the package or development of something new. I'm also open to cooperate with the projects related to the public health because it has become my hobby over the years :)

Mujingrui commented 3 years ago

Hi, @szymon-datalions

It's my pleasure that I can communicate with you here:) Thank you for so many nice replies. Many thanks again. I am working with COVID-19 data in Canada now. My thinking is want to make a kriged map to see how the pandemic spreads around the study area. Also using Bayesian Spatial-temporal model to estimate the areal risk and then applying non-parametric method to detect the zones of high risk.

COVID or flu are tricky ones but we have also created some data with PK ATP - to put weights into specific city districts, where infections may spread most rapidly. Yeah, it's difficult. There are so many hidden variables that need to be considered into. I tried the Google Mobility Data. The short-term prediction effect is not bad, but the long-term one is another story. What I am thinking is public health is a very sophisticated system, which is big project to solve it. Population size, economic, hospital resource....

By the way, I found that how to do a hexagons population map and then use it in ATP is important. It should be tried more times:) Hope we can have more opportunities to communicate in the future:) Thank you for your generous help!!

Best,