AnushaPB / wingen

Continuous mapping of genetic diversity
Other
12 stars 0 forks source link

Suggestions to Prevent Extremes in Kriging Predictions #23

Closed alexkrohn closed 10 months ago

alexkrohn commented 10 months ago

Hi there!

I was wondering if you had advice on how to prevent the kriging algorithm from predicting extremes where it has little evidence or sampling.

Here is an example map that I have produced for a rare pond-breeding amphibian.

wingen-plot-ho

Wingen predicts the area with the highest genetic diversity will be in coastal South Carolina, even though adjacent samples are not as high. It seems like the kriging algorithm is putting too much weight on adjacent individuals that are "medium-high" and extrapolating that unsampled areas will be higher. It is possible that this is correct, but untested. It also seems unlikely given other patterns in the dataset, but I know the kriging algorithm does not consider that.

I've tried changing the krig_method, the disaggregation factor, and the rarefaction strategy (up to rarify_nit = 50, down to min_n = 1. None seem to affect this prediction in South Carolina.

Is this just an artifact of the algorithm, an artifact of the spotty sampling, or is there something I can do to improve the kriging process?

Thank you for your help!

AnushaPB commented 10 months ago

Hi, Thanks for reaching out!

Can you send the original (non-kriged) wingen map that you are basing this kriged map off of? Also, would it be possible for you to plot each individual sample colored by genetic diversity (e.g., Ho) and send that as well? This will give me a better sense of where that hot-spot is coming from. If there is any issue with posting those figures publicly, feel free to directly email me at anusha.bishop@berkeley.edu.

alexkrohn commented 10 months ago

Hi Anusha!

Interesting! Your suggestion to plot the non-kriged wingen map was very helpful. Here is the code and non-kriged map. (This is also the map from which the kriged plot is created in the original post.)

final.metadata <- read_tsv("metadata.tsv")
lat.longs <- final.metadata %>%
  dplyr::select(Longitude, Latitude) %>%
  rename(x = Longitude, y = Latitude) %>%
  st_as_sf(coords = c("x", "y"), crs = "+proj=longlat")

# Convert lat/long to raster dataset
background.raster <- coords_to_raster(lat.longs, buffer = 1, plot = TRUE, disagg = 10)

# Project everyone into planar
lat.longs.planar <- lat.longs %>%
  st_transform(crs = "+proj=goode")
raster.planar <- project(background.raster, "+proj=goode")

# Do window analysis
window.analysis <- window_gd(my.vcf,
                             lat.longs.planar,
                             raster.planar,
                             stat = c("pi", "Ho"),
                             rarify = TRUE,
                             rarify_nit = 50,
                             min_n = 1,
                             fact = 0,
                             crop_edges = TRUE
)

wingen-plot-no-kriging As you can see, the cells in thebackground.raster are not very representative of the scale of samples. In this map I, I disaggregate at the kriging step. By disaggregating at the coords_to_raster step, I can create a much better smoothed map (although it takes a lot longer to calculate/load).

Here is the map of unkriged Ho with disaggregation at the coords_to_raster step: wingen-plot-no-kriging-disagg-at-background-raster

And that same map kriged: wingen-plot-ho-disagg-at-background

This seems like a much more realistic approximation. Going forward, it seems that modifying the background.raster to the final raster that you want, rather than modifying the raster at the krig_gd step provides better results. Is that what you would recommend?

Thanks for your help.

AnushaPB commented 10 months ago

That looks great!

Yes, the moving window calculations occur during the window_gd() step and this is where it is important to carefully define the size of your window.

The size of your window is determined by two things: (1) your window dimensions (wdim) and (2) the size of the cells in your background raster (lyr). You have to modify (1) and (2) to create a window that makes sense for your study system (which you have done here by modifying the cells of (2) to be smaller). At the krig_gd() step you are no longer calculating genetic diversity in a window, you are only using a spatially explicit model to basically smooth out your results. At the krig_gd() step changing the background raster will just affect the resolution of the final predicted surface.